[RFC,0/3] implement dlmopen hooks for gdb

Message ID 20200626193228.1953-1-danielwa@cisco.com
Headers show
Series
  • implement dlmopen hooks for gdb
Related show

Message

Cisco System, Inc. has a need to have dlmopen support in gdb, which
required glibc changes. I think it was known when glibc implemented
dlmopen that gdb would not work with it.

Since 2015 Cisco has had these patches in our inventor to fix issues in
glibc which prevented this type of gdb usage.

This RFC is mainly to get guidance on this implementation. We have some
individuals who have signed the copyright assignment for glibc, and we
will submit these (or different patches) formally thru those channels if
no one has issues with the implementation.

Also included in this are a couple of fixes which went along with the
original implementation.

Please provide any comments you might have.

Conan C Huang (3):
  Segfault when dlopen with RTLD_GLOBAL in dlmopened library
  glibc: dlopen RTLD_NOLOAD optimization
  add r_debug multiple namespaces support

 elf/dl-close.c |  7 ++++++-
 elf/dl-debug.c | 13 ++++++++++---
 elf/dl-open.c  |  8 +++++++-
 elf/link.h     |  4 ++++
 4 files changed, 27 insertions(+), 5 deletions(-)

-- 
2.17.1

Comments

On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:
> Cisco System, Inc. has a need to have dlmopen support in gdb, which

> required glibc changes. I think it was known when glibc implemented

> dlmopen that gdb would not work with it.

> 

> Since 2015 Cisco has had these patches in our inventor to fix issues in

> glibc which prevented this type of gdb usage.

> 

> This RFC is mainly to get guidance on this implementation. We have some

> individuals who have signed the copyright assignment for glibc, and we

> will submit these (or different patches) formally thru those channels if

> no one has issues with the implementation.

> 

> Also included in this are a couple of fixes which went along with the

> original implementation.

> 

> Please provide any comments you might have.

> 

> Conan C Huang (3):

>   Segfault when dlopen with RTLD_GLOBAL in dlmopened library

>   glibc: dlopen RTLD_NOLOAD optimization

>   add r_debug multiple namespaces support

> 

>  elf/dl-close.c |  7 ++++++-

>  elf/dl-debug.c | 13 ++++++++++---

>  elf/dl-open.c  |  8 +++++++-

>  elf/link.h     |  4 ++++

>  4 files changed, 27 insertions(+), 5 deletions(-)

> 


Thanks for looking at this. It is something the community would
absolutely like to see. I'll comment quickly to provide direction.

Florian Weimer, Pedro Alves, and I were talking about this as
recently as April where we tried to agree to just adding a
_r_debug_dlmopen with a new ABI for the debugger to use.

Your proposed solution of bumping the version is unacceptable,
and was last rejected by Roland McGrath. The problem is that
when you bump the version the current 

It is easier from
a backwards compatibility perspective to add a new _r_debug_dlmopen
and use that instead.

gdb checks for r_version != 1 and issues a warning, but keeps going:

6952           if (linux_read_memory (priv->r_debug + lmo->r_version_offset,
6953                                  (unsigned char *) &r_version,
6954                                  sizeof (r_version)) != 0
6955               || r_version != 1)
                      ^^^^^^^^^^^^^^
6956             {
6957               warning ("unexpected r_debug version %d", r_version);
6958             }

This is bad precedent that other software might have hard checks
for r_version != 1 stop operating correclty.

I suggest reviewing these threads:
https://sourceware.org/legacy-ml/libc-alpha/2012-11/msg00182.html
https://sourceware.org/legacy-ml/libc-alpha/2012-12/msg00278.html
https://sourceware.org/legacy-ml/libc-alpha/2013-01/msg00045.html

An alternative suggested in 2012 was to add a new DT_* entry to point
to the extended debug information e.g. DT_DEBUG_EXTENDED, and so avoid
needing ld.so for lookup of _r_debug_dlmopen.  Gary Benson also suggests
versioning the new structure, but being very clear what a "version bump"
means, in that we compatible add elements to the end after each version
change. So all consumers would check _r_debug_dlmopen.r_version > 1 to
know they had at least v1 elements.

And for reference from Solaris:
https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-1247.html#chapter6-15
I'd want to avoid having to run code to get at these objects, since
experience has shown this is always going to cause problems. Having
an entirely data-driven approach would be preferable, but locks us
into an ABI that we have to be able to bump.

-- 
Cheers,
Carlos.
* Daniel Walker via Libc-alpha:

> Also included in this are a couple of fixes which went along with the

> original implementation.


Have you seen crashes as the result of dlopen or dlsym failures in
secondary namespaces?

Thanks,
Florian
On Fri, Jun 26, 2020 at 11:30:12PM +0200, Florian Weimer wrote:
> * Daniel Walker via Libc-alpha:

> 

> > Also included in this are a couple of fixes which went along with the

> > original implementation.

> 

> Have you seen crashes as the result of dlopen or dlsym failures in

> secondary namespaces?


++ Conan

Conan has done the most work on this, and I think he's working on it in terms of
the product usage. I neglected to include him on this cover email. I've added
him to this email hopefully he can respond.

Daniel
The only crash we saw was the promotion of RTLD_GLOBAL flag in secondary 
namespace.  Apart from that we didn't notice any other crashes or dlsym failures.

However, we did noticed a design limitation with static TLS. Where shared objects
with static TLS can quickly use up static TLS block reserved by the loader. This
usually isn't a problem since only a few core libraries have static TLS and they are
not dlopened. However, during each dlmopen these core libraries like libc are
loaded and its static TLS uses up valuable space in static TLS block.  Resulting to:

	libc.so.6: cannot allocate memory in static TLS block

We are currently looking at how this can be enhanced. Maybe you guys already 
have discussions around this issue.


´╗┐On 2020-06-26, 9:11 PM, "Daniel Walker (danielwa)" <danielwa@cisco.com> wrote:

    On Fri, Jun 26, 2020 at 11:30:12PM +0200, Florian Weimer wrote:
    > * Daniel Walker via Libc-alpha:

    > 

    > > Also included in this are a couple of fixes which went along with the

    > > original implementation.

    > 

    > Have you seen crashes as the result of dlopen or dlsym failures in

    > secondary namespaces?


    ++ Conan

    Conan has done the most work on this, and I think he's working on it in terms of
    the product usage. I neglected to include him on this cover email. I've added
    him to this email hopefully he can respond.

    Daniel
On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote:
> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:

> > Cisco System, Inc. has a need to have dlmopen support in gdb, which

> > required glibc changes. I think it was known when glibc implemented

> > dlmopen that gdb would not work with it.

> > 

> > Since 2015 Cisco has had these patches in our inventor to fix issues in

> > glibc which prevented this type of gdb usage.

> > 

> > This RFC is mainly to get guidance on this implementation. We have some

> > individuals who have signed the copyright assignment for glibc, and we

> > will submit these (or different patches) formally thru those channels if

> > no one has issues with the implementation.

> > 

> > Also included in this are a couple of fixes which went along with the

> > original implementation.

> > 

> > Please provide any comments you might have.

> > 

> > Conan C Huang (3):

> >   Segfault when dlopen with RTLD_GLOBAL in dlmopened library

> >   glibc: dlopen RTLD_NOLOAD optimization

> >   add r_debug multiple namespaces support

> > 

> >  elf/dl-close.c |  7 ++++++-

> >  elf/dl-debug.c | 13 ++++++++++---

> >  elf/dl-open.c  |  8 +++++++-

> >  elf/link.h     |  4 ++++

> >  4 files changed, 27 insertions(+), 5 deletions(-)

> > 

> 

> Thanks for looking at this. It is something the community would

> absolutely like to see. I'll comment quickly to provide direction.

> 

> Florian Weimer, Pedro Alves, and I were talking about this as

> recently as April where we tried to agree to just adding a

> _r_debug_dlmopen with a new ABI for the debugger to use.

> 



Here's another RFC I suppose. It's basic code I've only compile tested. It's
based on the comments, and the threads you provided. It just abstracts out the
next link into another structure. Let me know if this is in the ballpark of the
discussions.


---
 elf/dl-debug.c             | 19 +++++++++++++++++--
 elf/link.h                 |  6 ++++++
 sysdeps/generic/ldsodefs.h |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/elf/dl-debug.c b/elf/dl-debug.c
index 4b3d3ad6ba..d0009744f8 100644
--- a/elf/dl-debug.c
+++ b/elf/dl-debug.c
@@ -35,6 +35,7 @@ extern const int verify_link_map_members[(VERIFY_MEMBER (l_addr)
    a statically-linked program there is no dynamic section for the debugger
    to examine and it looks for this particular symbol name.  */
 struct r_debug _r_debug;
+struct r_debug_dlmopen _r_debug_dlmopen;
 
 
 /* Initialize _r_debug if it has not already been done.  The argument is
@@ -45,11 +46,22 @@ struct r_debug *
 _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)
 {
   struct r_debug *r;
+  struct r_debug_dlmopen *r_ns, *rp_ns;
 
   if (ns == LM_ID_BASE)
-    r = &_r_debug;
+    {
+      r = &_r_debug;
+      r_ns = &_r_debug_dlmopen;
+    }
   else
-    r = &GL(dl_ns)[ns]._ns_debug;
+    {
+      r = &GL(dl_ns)[ns]._ns_debug;
+      r_ns = &GL(dl_ns)[ns]._ns_debug_dlmopen;
+      rp_ns = &GL(dl_ns)[ns - 1]._ns_debug_dlmopen;
+      rp_ns->next = r_ns;
+      if (ns - 1 == LM_ID_BASE)
+        _r_debug_dlmopen.next = r_ns;
+    }
 
   if (r->r_map == NULL || ldbase != 0)
     {
@@ -58,6 +70,9 @@ _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)
       r->r_ldbase = ldbase ?: _r_debug.r_ldbase;
       r->r_map = (void *) GL(dl_ns)[ns]._ns_loaded;
       r->r_brk = (ElfW(Addr)) &_dl_debug_state;
+      r_ns->r_debug = r;
+      r_ns->next = NULL;
+
     }
 
   return r;
diff --git a/elf/link.h b/elf/link.h
index 0048ad5d4d..c81945b671 100644
--- a/elf/link.h
+++ b/elf/link.h
@@ -63,8 +63,14 @@ struct r_debug
     ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */
   };
 
+struct r_debug_dlmopen
+  {
+    struct r_debug *r_debug;
+    struct r_debug_dlmopen *next;
+  };
 /* This is the instance of that structure used by the dynamic linker.  */
 extern struct r_debug _r_debug;
+extern struct r_debug_dlmopen _r_debug_dlmopen;
 
 /* This symbol refers to the "dynamic structure" in the `.dynamic' section
    of whatever module refers to `_DYNAMIC'.  So, to find its own
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index ba114ab4b1..d9794bc7a0 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -357,6 +357,7 @@ struct rtld_global
     } _ns_unique_sym_table;
     /* Keep track of changes to each namespace' list.  */
     struct r_debug _ns_debug;
+    struct r_debug_dlmopen _ns_debug_dlmopen;
   } _dl_ns[DL_NNS];
   /* One higher than index of last used namespace.  */
   EXTERN size_t _dl_nns;
-- 
2.17.1
On 7/23/20 2:40 PM, Daniel Walker (danielwa) wrote:
> On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote:

>> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:

>>> Cisco System, Inc. has a need to have dlmopen support in gdb, which

>>> required glibc changes. I think it was known when glibc implemented

>>> dlmopen that gdb would not work with it.

>>>

>>> Since 2015 Cisco has had these patches in our inventor to fix issues in

>>> glibc which prevented this type of gdb usage.

>>>

>>> This RFC is mainly to get guidance on this implementation. We have some

>>> individuals who have signed the copyright assignment for glibc, and we

>>> will submit these (or different patches) formally thru those channels if

>>> no one has issues with the implementation.

>>>

>>> Also included in this are a couple of fixes which went along with the

>>> original implementation.

>>>

>>> Please provide any comments you might have.

>>>

>>> Conan C Huang (3):

>>>   Segfault when dlopen with RTLD_GLOBAL in dlmopened library

>>>   glibc: dlopen RTLD_NOLOAD optimization

>>>   add r_debug multiple namespaces support

>>>

>>>  elf/dl-close.c |  7 ++++++-

>>>  elf/dl-debug.c | 13 ++++++++++---

>>>  elf/dl-open.c  |  8 +++++++-

>>>  elf/link.h     |  4 ++++

>>>  4 files changed, 27 insertions(+), 5 deletions(-)

>>>

>>

>> Thanks for looking at this. It is something the community would

>> absolutely like to see. I'll comment quickly to provide direction.

>>

>> Florian Weimer, Pedro Alves, and I were talking about this as

>> recently as April where we tried to agree to just adding a

>> _r_debug_dlmopen with a new ABI for the debugger to use.

>>

> 

> 

> Here's another RFC I suppose. It's basic code I've only compile tested. It's

> based on the comments, and the threads you provided. It just abstracts out the

> next link into another structure. Let me know if this is in the ballpark of the

> discussions.


I only looked over this briefly, but I think it's on the right track.

The point is to use *another* data symbol for the debugger to use to access
the link maps. Then the debugger can look for that and try to use that to
access a list of maps.

Your next step would be to export the symbol via Versions at the current
symbol node GLIBC_2.32 (soon to be GLIBC_2.33).

The harder part will be the debugger changes because you have to look for
_r_debug_dlmopen in preference to _r_debug, and they are different layouts,
and once you find _r_debug_dlmopen you have to be able to maintain the
lookup scope of the namespace you're in within the debugger.


> ---

>  elf/dl-debug.c             | 19 +++++++++++++++++--

>  elf/link.h                 |  6 ++++++

>  sysdeps/generic/ldsodefs.h |  1 +

>  3 files changed, 24 insertions(+), 2 deletions(-)

> 

> diff --git a/elf/dl-debug.c b/elf/dl-debug.c

> index 4b3d3ad6ba..d0009744f8 100644

> --- a/elf/dl-debug.c

> +++ b/elf/dl-debug.c

> @@ -35,6 +35,7 @@ extern const int verify_link_map_members[(VERIFY_MEMBER (l_addr)

>     a statically-linked program there is no dynamic section for the debugger

>     to examine and it looks for this particular symbol name.  */

>  struct r_debug _r_debug;

> +struct r_debug_dlmopen _r_debug_dlmopen;

>  

>  

>  /* Initialize _r_debug if it has not already been done.  The argument is

> @@ -45,11 +46,22 @@ struct r_debug *

>  _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)

>  {

>    struct r_debug *r;

> +  struct r_debug_dlmopen *r_ns, *rp_ns;

>  

>    if (ns == LM_ID_BASE)

> -    r = &_r_debug;

> +    {

> +      r = &_r_debug;

> +      r_ns = &_r_debug_dlmopen;

> +    }

>    else

> -    r = &GL(dl_ns)[ns]._ns_debug;

> +    {

> +      r = &GL(dl_ns)[ns]._ns_debug;

> +      r_ns = &GL(dl_ns)[ns]._ns_debug_dlmopen;

> +      rp_ns = &GL(dl_ns)[ns - 1]._ns_debug_dlmopen;

> +      rp_ns->next = r_ns;

> +      if (ns - 1 == LM_ID_BASE)

> +        _r_debug_dlmopen.next = r_ns;

> +    }

>  

>    if (r->r_map == NULL || ldbase != 0)

>      {

> @@ -58,6 +70,9 @@ _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)

>        r->r_ldbase = ldbase ?: _r_debug.r_ldbase;

>        r->r_map = (void *) GL(dl_ns)[ns]._ns_loaded;

>        r->r_brk = (ElfW(Addr)) &_dl_debug_state;

> +      r_ns->r_debug = r;

> +      r_ns->next = NULL;

> +

>      }

>  

>    return r;

> diff --git a/elf/link.h b/elf/link.h

> index 0048ad5d4d..c81945b671 100644

> --- a/elf/link.h

> +++ b/elf/link.h

> @@ -63,8 +63,14 @@ struct r_debug

>      ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */

>    };

>  

> +struct r_debug_dlmopen

> +  {

> +    struct r_debug *r_debug;

> +    struct r_debug_dlmopen *next;

> +  };

>  /* This is the instance of that structure used by the dynamic linker.  */

>  extern struct r_debug _r_debug;

> +extern struct r_debug_dlmopen _r_debug_dlmopen;

>  

>  /* This symbol refers to the "dynamic structure" in the `.dynamic' section

>     of whatever module refers to `_DYNAMIC'.  So, to find its own

> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h

> index ba114ab4b1..d9794bc7a0 100644

> --- a/sysdeps/generic/ldsodefs.h

> +++ b/sysdeps/generic/ldsodefs.h

> @@ -357,6 +357,7 @@ struct rtld_global

>      } _ns_unique_sym_table;

>      /* Keep track of changes to each namespace' list.  */

>      struct r_debug _ns_debug;

> +    struct r_debug_dlmopen _ns_debug_dlmopen;

>    } _dl_ns[DL_NNS];

>    /* One higher than index of last used namespace.  */

>    EXTERN size_t _dl_nns;

> 



-- 
Cheers,
Carlos.
Michael Kerrisk \(man-pages\) via Libc-alpha Sept. 16, 2020, 4:18 p.m. | #7
On Thu, Jul 23, 2020 at 05:20:23PM -0400, Carlos O'Donell wrote:
> On 7/23/20 2:40 PM, Daniel Walker (danielwa) wrote:

> > On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote:

> >> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:

> >>> Cisco System, Inc. has a need to have dlmopen support in gdb, which

> >>> required glibc changes. I think it was known when glibc implemented

> >>> dlmopen that gdb would not work with it.

> >>>

> >>> Since 2015 Cisco has had these patches in our inventor to fix issues in

> >>> glibc which prevented this type of gdb usage.

> >>>

> >>> This RFC is mainly to get guidance on this implementation. We have some

> >>> individuals who have signed the copyright assignment for glibc, and we

> >>> will submit these (or different patches) formally thru those channels if

> >>> no one has issues with the implementation.

> >>>

> >>> Also included in this are a couple of fixes which went along with the

> >>> original implementation.

> >>>

> >>> Please provide any comments you might have.

> >>>

> >>> Conan C Huang (3):

> >>>   Segfault when dlopen with RTLD_GLOBAL in dlmopened library

> >>>   glibc: dlopen RTLD_NOLOAD optimization

> >>>   add r_debug multiple namespaces support

> >>>

> >>>  elf/dl-close.c |  7 ++++++-

> >>>  elf/dl-debug.c | 13 ++++++++++---

> >>>  elf/dl-open.c  |  8 +++++++-

> >>>  elf/link.h     |  4 ++++

> >>>  4 files changed, 27 insertions(+), 5 deletions(-)

> >>>

> >>

> >> Thanks for looking at this. It is something the community would

> >> absolutely like to see. I'll comment quickly to provide direction.

> >>

> >> Florian Weimer, Pedro Alves, and I were talking about this as

> >> recently as April where we tried to agree to just adding a

> >> _r_debug_dlmopen with a new ABI for the debugger to use.

> >>

> > 

> > 

> > Here's another RFC I suppose. It's basic code I've only compile tested. It's

> > based on the comments, and the threads you provided. It just abstracts out the

> > next link into another structure. Let me know if this is in the ballpark of the

> > discussions.

> 

> I only looked over this briefly, but I think it's on the right track.

> 

> The point is to use *another* data symbol for the debugger to use to access

> the link maps. Then the debugger can look for that and try to use that to

> access a list of maps.

> 

> Your next step would be to export the symbol via Versions at the current

> symbol node GLIBC_2.32 (soon to be GLIBC_2.33).

> 

> The harder part will be the debugger changes because you have to look for

> _r_debug_dlmopen in preference to _r_debug, and they are different layouts,

> and once you find _r_debug_dlmopen you have to be able to maintain the

> lookup scope of the namespace you're in within the debugger.

> 



The second structure seems to work except making it available to GDB. I would
guess there are suggestions for this from you or this list.

A couple ideas,

1) GDB does pointer arithmetic off the r_debug DT_DEBUG value to find the
r_debug_dlmopen structure. Add a linker script into glibc to force the two
structures arrangement in memory, or use a section tag.

2) Add another dynamic linker entry to go along with DT_DEBUG like
DT_DEBUG_DLMOPEN.


Any other ideas for this ?

Thanks,
Daniel
Michael Kerrisk \(man-pages\) via Libc-alpha Sept. 17, 2020, 12:52 p.m. | #8
On 9/16/20 12:18 PM, Daniel Walker (danielwa) wrote:
> On Thu, Jul 23, 2020 at 05:20:23PM -0400, Carlos O'Donell wrote:

>> On 7/23/20 2:40 PM, Daniel Walker (danielwa) wrote:

>>> On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote:

>>>> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:

>>>>> Cisco System, Inc. has a need to have dlmopen support in gdb, which

>>>>> required glibc changes. I think it was known when glibc implemented

>>>>> dlmopen that gdb would not work with it.

>>>>>

>>>>> Since 2015 Cisco has had these patches in our inventor to fix issues in

>>>>> glibc which prevented this type of gdb usage.

>>>>>

>>>>> This RFC is mainly to get guidance on this implementation. We have some

>>>>> individuals who have signed the copyright assignment for glibc, and we

>>>>> will submit these (or different patches) formally thru those channels if

>>>>> no one has issues with the implementation.

>>>>>

>>>>> Also included in this are a couple of fixes which went along with the

>>>>> original implementation.

>>>>>

>>>>> Please provide any comments you might have.

>>>>>

>>>>> Conan C Huang (3):

>>>>>   Segfault when dlopen with RTLD_GLOBAL in dlmopened library

>>>>>   glibc: dlopen RTLD_NOLOAD optimization

>>>>>   add r_debug multiple namespaces support

>>>>>

>>>>>  elf/dl-close.c |  7 ++++++-

>>>>>  elf/dl-debug.c | 13 ++++++++++---

>>>>>  elf/dl-open.c  |  8 +++++++-

>>>>>  elf/link.h     |  4 ++++

>>>>>  4 files changed, 27 insertions(+), 5 deletions(-)

>>>>>

>>>>

>>>> Thanks for looking at this. It is something the community would

>>>> absolutely like to see. I'll comment quickly to provide direction.

>>>>

>>>> Florian Weimer, Pedro Alves, and I were talking about this as

>>>> recently as April where we tried to agree to just adding a

>>>> _r_debug_dlmopen with a new ABI for the debugger to use.

>>>>

>>>

>>>

>>> Here's another RFC I suppose. It's basic code I've only compile tested. It's

>>> based on the comments, and the threads you provided. It just abstracts out the

>>> next link into another structure. Let me know if this is in the ballpark of the

>>> discussions.

>>

>> I only looked over this briefly, but I think it's on the right track.

>>

>> The point is to use *another* data symbol for the debugger to use to access

>> the link maps. Then the debugger can look for that and try to use that to

>> access a list of maps.

>>

>> Your next step would be to export the symbol via Versions at the current

>> symbol node GLIBC_2.32 (soon to be GLIBC_2.33).

>>

>> The harder part will be the debugger changes because you have to look for

>> _r_debug_dlmopen in preference to _r_debug, and they are different layouts,

>> and once you find _r_debug_dlmopen you have to be able to maintain the

>> lookup scope of the namespace you're in within the debugger.

>>

> 

> 

> The second structure seems to work except making it available to GDB. I would

> guess there are suggestions for this from you or this list.

> 

> A couple ideas,

> 

> 1) GDB does pointer arithmetic off the r_debug DT_DEBUG value to find the

> r_debug_dlmopen structure. Add a linker script into glibc to force the two

> structures arrangement in memory, or use a section tag.


In gdbserver I see that it's using DT_DEBUG exclusively to find _r_debug.

in gdb/solib-svr4.c:

 798   /* Find DT_DEBUG.  */
 799   if (scan_dyntag (DT_DEBUG, exec_bfd, &dyn_ptr, NULL)
 800       || scan_dyntag_auxv (DT_DEBUG, &dyn_ptr, NULL))
 801     return dyn_ptr;
 802 
 803   /* This may be a static executable.  Look for the symbol
 804      conventionally named _r_debug, as a last resort.  */
 805   msymbol = lookup_minimal_symbol ("_r_debug", NULL, symfile_objfile);
 806   if (msymbol.minsym != NULL)
 807     return BMSYMBOL_VALUE_ADDRESS (msymbol);

This code makes the most sense to me.

You look for DT_DEBUG otherwise lookup _r_debug (which is _r_debug@@GLIBC_2.2.5 on x86_64).

I would say that finding _r_debug_dlmopen would require lookup up the
symbol, not as a last resort, but as a definition of the API.

You will always have .dynsym with a definition for _r_debug_dlmopen.
 
> 2) Add another dynamic linker entry to go along with DT_DEBUG like

> DT_DEBUG_DLMOPEN.


This is one way which avoids hard coding _r_debug_dlmopen and instead
puts it into a DT_* tag, but requires we add a new tag.

I have no strong opinion here. Having the tag avoids going through
the symbol lookup, so it could have good value.

In gdbserver/linux-low.cc we have get_r_debug which doesn't do anything
but looking at DT_DEBUG. This would need changing to to lookup
_r_debug_dlmopen in that area, or DT_DEBUG_DLMOPEN.

However, looking at my i686/x86_64 system I don't see DT_DEBUG being
set so I don't know how this works with gdbserver? I could have sworn
we were using DT_DEBUG on x86... if we don't then we should fix that,
but that's another bug.

-- 
Cheers,
Carlos.
Michael Kerrisk \(man-pages\) via Libc-alpha Sept. 17, 2020, 12:59 p.m. | #9
* Carlos O'Donell:

> You will always have .dynsym with a definition for _r_debug_dlmopen.


Note that this doesn't work if you just have a core file.  In order to
find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same
copy of ld.so that was used by the executable, otherwise the symbol
cannot be found in the image.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On 9/17/20 8:52 AM, Carlos O'Donell wrote:
> However, looking at my i686/x86_64 system I don't see DT_DEBUG being

> set so I don't know how this works with gdbserver? I could have sworn

> we were using DT_DEBUG on x86... if we don't then we should fix that,

> but that's another bug.


I looked at the wrong thing.

We *are* creating a DT_DEBUG entry.

The loader fills DT_DEBUG with &_r_debug at runtime.

Core files have DT_DEBUG with the runtime &_r_debug value.

This means core file can avoid needing to look things up in ld.so.

-- 
Cheers,
Carlos.
On 9/17/20 8:59 AM, Florian Weimer wrote:
> * Carlos O'Donell:

> 

>> You will always have .dynsym with a definition for _r_debug_dlmopen.

> 

> Note that this doesn't work if you just have a core file.  In order to

> find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same

> copy of ld.so that was used by the executable, otherwise the symbol

> cannot be found in the image.


You are correct.

I followed up on my own email regarding this.

So in the end to get process and core file debugging we'll need:

* _r_debug_dlmopen
* DT_DEBUG_DLMOPEN

-- 
Cheers,
Carlos.
On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote:
> On 9/17/20 8:59 AM, Florian Weimer wrote:

> > * Carlos O'Donell:

> > 

> >> You will always have .dynsym with a definition for _r_debug_dlmopen.

> > 

> > Note that this doesn't work if you just have a core file.  In order to

> > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same

> > copy of ld.so that was used by the executable, otherwise the symbol

> > cannot be found in the image.

> 

> You are correct.

> 

> I followed up on my own email regarding this.

> 

> So in the end to get process and core file debugging we'll need:

> 

> * _r_debug_dlmopen

> * DT_DEBUG_DLMOPEN

> 


It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some effort. Have
you considered this ? The last one which was added was DT_SYMTAB_SHNDX in 2018,
and it looks like it did not come from glibc.

Daniel
* Daniel Walker:

> On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote:

>> On 9/17/20 8:59 AM, Florian Weimer wrote:

>> > * Carlos O'Donell:

>> > 

>> >> You will always have .dynsym with a definition for _r_debug_dlmopen.

>> > 

>> > Note that this doesn't work if you just have a core file.  In order to

>> > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same

>> > copy of ld.so that was used by the executable, otherwise the symbol

>> > cannot be found in the image.

>> 

>> You are correct.

>> 

>> I followed up on my own email regarding this.

>> 

>> So in the end to get process and core file debugging we'll need:

>> 

>> * _r_debug_dlmopen

>> * DT_DEBUG_DLMOPEN

>> 

>

> It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some

> effort. Have you considered this ? The last one which was added was

> DT_SYMTAB_SHNDX in 2018, and it looks like it did not come from glibc.


We are reviving GNU gABI maintenance.  There's been quite a bit of list
activity, and a proposal of a first ABI document:

  <https://sourceware.org/pipermail/gnu-gabi/2020q3/thread.html>

I have a feeling that we might be soon over this bump, and getting
things added should become easier.

In the meantime, can we demo this feature without DT_DEBUG_DLMOPEN?
With a patch glibc and gdb?  Incidentally, I have an LD_AUDIT issue I
need to debug. 8-)

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On Fri, Sep 18, 2020 at 05:40:30PM +0200, Florian Weimer wrote:
> * Daniel Walker:

> 

> > On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote:

> >> On 9/17/20 8:59 AM, Florian Weimer wrote:

> >> > * Carlos O'Donell:

> >> > 

> >> >> You will always have .dynsym with a definition for _r_debug_dlmopen.

> >> > 

> >> > Note that this doesn't work if you just have a core file.  In order to

> >> > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same

> >> > copy of ld.so that was used by the executable, otherwise the symbol

> >> > cannot be found in the image.

> >> 

> >> You are correct.

> >> 

> >> I followed up on my own email regarding this.

> >> 

> >> So in the end to get process and core file debugging we'll need:

> >> 

> >> * _r_debug_dlmopen

> >> * DT_DEBUG_DLMOPEN

> >> 

> >

> > It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some

> > effort. Have you considered this ? The last one which was added was

> > DT_SYMTAB_SHNDX in 2018, and it looks like it did not come from glibc.

> 

> We are reviving GNU gABI maintenance.  There's been quite a bit of list

> activity, and a proposal of a first ABI document:

> 

>   <https://sourceware.org/pipermail/gnu-gabi/2020q3/thread.html>

> 

> I have a feeling that we might be soon over this bump, and getting

> things added should become easier.

> 

> In the meantime, can we demo this feature without DT_DEBUG_DLMOPEN?

> With a patch glibc and gdb?  Incidentally, I have an LD_AUDIT issue I

> need to debug. 8-)

> 


The only fully working version we have is the one I released originally. Yes,
that version had no DT_DEBUG_DLMOPEN. It should be working and you can demo it. 

We're still working on updating GDB to use the new interfaces.

In terms of updating the gABI, should I just add a patch to glibc to add values
or do I need special documents to be submitted ?

Daniel
* Carlos O'Donell via Libc-alpha:

> Your next step would be to export the symbol via Versions at the current

> symbol node GLIBC_2.32 (soon to be GLIBC_2.33).


Can we create a new GLIBC_DEBUG symbol versions for symbols which are
not intended to be used for run-time linking?

The idea is that consumers will have deal with the absence of these
symbols anyway, so we just need one symbol version that does not depend
on the glibc version for this.  Dependency management considerations
(that apply to symbols with run-time linking) do not come into play here.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On 9/22/20 1:06 PM, Florian Weimer wrote:
> * Carlos O'Donell via Libc-alpha:

> 

>> Your next step would be to export the symbol via Versions at the current

>> symbol node GLIBC_2.32 (soon to be GLIBC_2.33).

> 

> Can we create a new GLIBC_DEBUG symbol versions for symbols which are

> not intended to be used for run-time linking?

> 

> The idea is that consumers will have deal with the absence of these

> symbols anyway, so we just need one symbol version that does not depend

> on the glibc version for this.  Dependency management considerations

> (that apply to symbols with run-time linking) do not come into play here.


I don't object to GLIBC_DEBUG, like GLIBC_PRIVATE it can be considered
a transient ABI that is valid only for a major release?

-- 
Cheers,
Carlos.
* Carlos O'Donell:

> On 9/22/20 1:06 PM, Florian Weimer wrote:

>> * Carlos O'Donell via Libc-alpha:

>> 

>>> Your next step would be to export the symbol via Versions at the current

>>> symbol node GLIBC_2.32 (soon to be GLIBC_2.33).

>> 

>> Can we create a new GLIBC_DEBUG symbol versions for symbols which are

>> not intended to be used for run-time linking?

>> 

>> The idea is that consumers will have deal with the absence of these

>> symbols anyway, so we just need one symbol version that does not depend

>> on the glibc version for this.  Dependency management considerations

>> (that apply to symbols with run-time linking) do not come into play here.

>

> I don't object to GLIBC_DEBUG, like GLIBC_PRIVATE it can be considered

> a transient ABI that is valid only for a major release?


No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is
there (and perhaps has the documented size), it has the documented
semantics.  But you can't assume that it is present.

The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even
between builds.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On 9/22/20 1:37 PM, Florian Weimer wrote:
> * Carlos O'Donell:

> 

>> On 9/22/20 1:06 PM, Florian Weimer wrote:

>>> * Carlos O'Donell via Libc-alpha:

>>>

>>>> Your next step would be to export the symbol via Versions at the current

>>>> symbol node GLIBC_2.32 (soon to be GLIBC_2.33).

>>>

>>> Can we create a new GLIBC_DEBUG symbol versions for symbols which are

>>> not intended to be used for run-time linking?

>>>

>>> The idea is that consumers will have deal with the absence of these

>>> symbols anyway, so we just need one symbol version that does not depend

>>> on the glibc version for this.  Dependency management considerations

>>> (that apply to symbols with run-time linking) do not come into play here.

>>

>> I don't object to GLIBC_DEBUG, like GLIBC_PRIVATE it can be considered

>> a transient ABI that is valid only for a major release?

> 

> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is

> there (and perhaps has the documented size), it has the documented

> semantics.  But you can't assume that it is present.

> 

> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even

> between builds.


Yes, absolutely, I agree completely, for it to be useful the semantics
have to be:

- If you detect a given symbol foo@GLIBC_DEBUG, then the feature is
  present and has the semantics you expect.

- If you want new semantics then you need to make a foo2@GLIBC_DEBUG
  with the new semantics.

What are the runtime semantics of the symbol? How do you access it?

-- 
Cheers,
Carlos.
* Carlos O'Donell:

>> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is

>> there (and perhaps has the documented size), it has the documented

>> semantics.  But you can't assume that it is present.

>> 

>> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even

>> between builds.

>

> Yes, absolutely, I agree completely, for it to be useful the semantics

> have to be:

>

> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is

>   present and has the semantics you expect.

>

> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG

>   with the new semantics.

>

> What are the runtime semantics of the symbol? How do you access it?


That obviously depends on the symbol?  Sorry, I don't quite understand
these questions.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Andreas Schwab Sept. 22, 2020, 6:17 p.m. | #20
On Sep 22 2020, Carlos O'Donell via Libc-alpha wrote:

> Yes, absolutely, I agree completely, for it to be useful the semantics

> have to be:

>

> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is

>   present and has the semantics you expect.

>

> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG

>   with the new semantics.

>

> What are the runtime semantics of the symbol? How do you access it?


Isn't that the same situation as libthread_db?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
On 9/22/20 2:04 PM, Florian Weimer wrote:
> * Carlos O'Donell:

> 

>>> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is

>>> there (and perhaps has the documented size), it has the documented

>>> semantics.  But you can't assume that it is present.

>>>

>>> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even

>>> between builds.

>>

>> Yes, absolutely, I agree completely, for it to be useful the semantics

>> have to be:

>>

>> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is

>>   present and has the semantics you expect.

>>

>> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG

>>   with the new semantics.

>>

>> What are the runtime semantics of the symbol? How do you access it?

> 

> That obviously depends on the symbol?  Sorry, I don't quite understand

> these questions.


You noted "not intended to be used for run-time linking?"

Could you expand on what you're thinking there?

-- 
Cheers,
Carlos.
* Carlos O'Donell:

> On 9/22/20 2:04 PM, Florian Weimer wrote:

>> * Carlos O'Donell:

>> 

>>>> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is

>>>> there (and perhaps has the documented size), it has the documented

>>>> semantics.  But you can't assume that it is present.

>>>>

>>>> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even

>>>> between builds.

>>>

>>> Yes, absolutely, I agree completely, for it to be useful the semantics

>>> have to be:

>>>

>>> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is

>>>   present and has the semantics you expect.

>>>

>>> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG

>>>   with the new semantics.

>>>

>>> What are the runtime semantics of the symbol? How do you access it?

>> 

>> That obviously depends on the symbol?  Sorry, I don't quite understand

>> these questions.

>

> You noted "not intended to be used for run-time linking?"

>

> Could you expand on what you're thinking there?


If there are no versioned dependencies on the symbol at the ELF level,
then the issues that require some distributions to backport whole symbol
sets do not apply.  The exact contents of the GLIBC_DEBUG symbol set
does not matter than.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
On 9/22/20 2:44 PM, Florian Weimer wrote:
> * Carlos O'Donell:

> 

>> On 9/22/20 2:04 PM, Florian Weimer wrote:

>>> * Carlos O'Donell:

>>>

>>>>> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is

>>>>> there (and perhaps has the documented size), it has the documented

>>>>> semantics.  But you can't assume that it is present.

>>>>>

>>>>> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even

>>>>> between builds.

>>>>

>>>> Yes, absolutely, I agree completely, for it to be useful the semantics

>>>> have to be:

>>>>

>>>> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is

>>>>   present and has the semantics you expect.

>>>>

>>>> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG

>>>>   with the new semantics.

>>>>

>>>> What are the runtime semantics of the symbol? How do you access it?

>>>

>>> That obviously depends on the symbol?  Sorry, I don't quite understand

>>> these questions.

>>

>> You noted "not intended to be used for run-time linking?"

>>

>> Could you expand on what you're thinking there?

> 

> If there are no versioned dependencies on the symbol at the ELF level,

> then the issues that require some distributions to backport whole symbol

> sets do not apply.  The exact contents of the GLIBC_DEBUG symbol set

> does not matter than.


Thank you for the clarification. I agree completely.

-- 
Cheers,
Carlos.
On 9/22/20 2:17 PM, Andreas Schwab wrote:
> On Sep 22 2020, Carlos O'Donell via Libc-alpha wrote:

> 

>> Yes, absolutely, I agree completely, for it to be useful the semantics

>> have to be:

>>

>> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is

>>   present and has the semantics you expect.

>>

>> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG

>>   with the new semantics.

>>

>> What are the runtime semantics of the symbol? How do you access it?

> 

> Isn't that the same situation as libthread_db?


Yes, but coupled to libc.so, and doesn't require finding and loading
another matching library.

Taking that direction would mean creating a symbol in libthread_db.
In this particular case the symbol would provide the address of
the new structure that you could walk that contains the namespace
lists (that themselves contain linkmap lists).

In my opinion we should be heading towards the complete removal of
libthread_db from glibc because as an interface it requires that
the debugger load a library from a potentially untrusted filesystem
(or container) and execute code in order to debug the process.

I would rather see data-driven approaches where foo@GLIBC_DEBUG is
a data symbol and exposes a structure that can be walked to gather
information about the inferior.

It is also difficult if not impossible for a kernel-side agent to
run target code from libthread_db to resolve the result.

Keeping the symbol in libc.so avoids any debugger having to
locate the matching libthread_db, which is not always in the same
place as the library.

In summary:
- Use data symbols.
- Avoid needing to run code to resolve result.
- Keeps interface matched and in libc.so.

-- 
Cheers,
Carlos.