[00/19] libctf, and CTF support for objdump and readelf

Message ID 20190430225706.159422-1-nick.alcock@oracle.com
Headers show
Series
  • libctf, and CTF support for objdump and readelf
Related show

Message

Nick Alcock April 30, 2019, 10:56 p.m.
This submission is the first part of multiple patch series which together add
support for the Compact ANSI-C Type Format to the GNU toolchain.

Compact C Type Format (CTF) is a reduced form of debugging information whose
main purpose is to describe the type of C entities such as structures, unions,
typedefs and function arguments.  CTF format is optimized for compactness: it
was originally designed for use-cases like dynamic tracing and online
in-application debugging, where debugging information is meant to be present
even in stripped binaries.

CTF gains compactness over DWARF in four ways:

  - a more compact encoding, at the cost of irregularity.  Rather than a regular
    scheme of tags and attributes within those tags, the structures are
    customized for each kind of C type described, which allows significant space
    savings.  IDs are omitted and implied wherever possible to save even more
    space (the ID of each type in the type table is implied by its position,
    trading space for the need to scan the table at load time).

  - reuse of strings from the ELF file: CTF files have one string table built
    into the CTF format itself and one "external" table which is usually the
    table in the ELF file.  Further improvements are possible here without
    format changes: we are looking into this.

  - a very compact association between the ELF symbol table and CTF.  No symbol
    table indexes are recorded: all are implied: the data-object section
    is as compact as possible, containing nothing but a stream of type IDs
    describing the type of data symbols in symbol table order.

  - aggressive link-time deduplication will be added in the next patch
    series and will be the default behavior, resulting in further space
    savings.

Types in CTF can be looked up by traversal from other types via a numeric type
ID, by traversal of all types in the file, by ELF symbol table ID or by name
(though not all types need be named). As in C, there are separate namesapces for
top-level types, enums, structs and unions.  There is no analogue of block
scope: types within functions must either be promoted to the top level, stored
in another CTF container (perhaps using CTF's single-level parent/child
relationship), or not represented at all. Since the principal use case of CTF is
to look up types given symbol table entries, and symbol tables are also a single
flat namespace, this is not expected to be a serious limitation.

All types CTF describes have a fixed size at CTF generation time, and there is
nothing like the DWARF exprloc interpreter to compute the size of variably-sized
entities. This is due to the adopted 'top-level model' and, consequently, VLAs
are not supported.

For an overview of the CTF format, see the documentation in the email I'll post
as a followup to this patch series (we have yet to figure out where to put it).

Most of this patch series implements a library that reads and writes this
format, and a header file describing it.  The linker, debugger, and
objdump/objcopy are expected to use the library, while GCC will not.


This first submission consists of the core library in libctf/, and CTF support
in objdump and readelf: this leverages the debugging dump support in libctf, so
the objdump and readelf support is a few dozen lines each on top of that.


This patch series is an RFC. We are still implementing some additional features
in order to capture the full potential of the format.  Enhancements planned
include:

A new section implementing a compact version of DW_AT_call_site et al, allowing
efficient backtracing even when the debuginfo is missing, including recovery of
the value of parameters in at least 99.9% of the cases handled by DWARF 5 in
existing code.

Enhancements to the core data structures (particularly ctf_stype_t and
ctf_member_t) reducing space spent on unused type bits when type IDs are < 2^16.

Nick Alcock (19):
  include: new header ctf.h: file format description
  include: new header ctf-api.h
  libctf: lowest-level memory allocation and debug-dumping wrappers
  libctf: low-level list manipulation and helper utilities
  libctf: error handling
  libctf: hashing
  libctf: implementation definitions related to file creation
  libctf: creation functions
  libctf: opening
  libctf: ELF file opening
  libctf: core type lookup
  libctf: lookups by name and symbol
  libctf: type copying
  libctf: library version enforcement
  libctf: mmappable archives
  libctf: labels
  libctf: debug dumping
  libctf: build system
  binutils: CTF support for objdump and readelf

 Makefile.def                    |    5 +
 Makefile.in                     |  984 ++++-
 binutils/Makefile.am            |   10 +-
 binutils/Makefile.in            |   18 +-
 binutils/aclocal.m4             |   10 +-
 binutils/doc/Makefile.in        |    9 +-
 binutils/doc/binutils.texi      |   12 +
 binutils/doc/ctf.options.texi   |   19 +
 binutils/objdump.c              |  156 +-
 binutils/readelf.c              |  206 +
 configure                       |    2 +-
 configure.ac                    |    2 +-
 include/ctf-api.h               |  354 ++
 include/ctf.h                   |  427 ++
 libctf/Makefile.am              |   31 +
 libctf/Makefile.in              |  767 ++++
 {binutils => libctf}/aclocal.m4 |   99 +-
 libctf/config.h.in              |   98 +
 libctf/configure                | 7120 +++++++++++++++++++++++++++++++
 libctf/configure.ac             |   59 +
 libctf/ctf-archive.c            |  491 +++
 libctf/ctf-create.c             | 1937 +++++++++
 libctf/ctf-decl.c               |  195 +
 libctf/ctf-dump.c               |  595 +++
 libctf/ctf-error.c              |   93 +
 libctf/ctf-hash.c               |  277 ++
 libctf/ctf-impl.h               |  404 ++
 libctf/ctf-labels.c             |  138 +
 libctf/ctf-lib.c                |  506 +++
 libctf/ctf-lookup.c             |  427 ++
 libctf/ctf-open.c               | 1359 ++++++
 libctf/ctf-subr.c               |   74 +
 libctf/ctf-types.c              | 1019 +++++
 libctf/ctf-util.c               |  176 +
 34 files changed, 18008 insertions(+), 71 deletions(-)
 create mode 100644 binutils/doc/ctf.options.texi
 create mode 100644 include/ctf-api.h
 create mode 100644 include/ctf.h
 create mode 100644 libctf/Makefile.am
 create mode 100644 libctf/Makefile.in
 copy {binutils => libctf}/aclocal.m4 (95%)
 create mode 100644 libctf/config.h.in
 create mode 100755 libctf/configure
 create mode 100644 libctf/configure.ac
 create mode 100644 libctf/ctf-archive.c
 create mode 100644 libctf/ctf-create.c
 create mode 100644 libctf/ctf-decl.c
 create mode 100644 libctf/ctf-dump.c
 create mode 100644 libctf/ctf-error.c
 create mode 100644 libctf/ctf-hash.c
 create mode 100644 libctf/ctf-impl.h
 create mode 100644 libctf/ctf-labels.c
 create mode 100644 libctf/ctf-lib.c
 create mode 100644 libctf/ctf-lookup.c
 create mode 100644 libctf/ctf-open.c
 create mode 100644 libctf/ctf-subr.c
 create mode 100644 libctf/ctf-types.c
 create mode 100644 libctf/ctf-util.c

-- 
2.21.0.237.gd0cfaa883d

Comments

Nick Alcock April 30, 2019, 10:57 p.m. | #1
CTF FILE FORMAT
---------------

A CTF file ("container", since it is usually not a file, but an ELF section or
something or that sort) is divided into a number of sections internally,
identified by offset from the header. In order, the sections are:

 - Type label section
 - Data object section
 - Function info section
 - Variable info section
 - Data type section
 - String table

We'll consider these in order of importance (not the same as order in the file).

Other things in the header:
  - a preamble containing a magic number (used to determine container
    endianness: libctf will endian-flip foreign-endian containers into the
    native endianness at open time), a version number, whose current value is
    the CTF_VERSION constant, and a set of CTF_F global flags
  - a parent container name and label, which indicates (in some
    consumer-dependent way) the name of the container containing types whose ID
    has its MSB turned on (the "parent container"): it is only nonzero if this
    container is not itself a parent.  This allows types to be shared between
    containers: with one container being the parent of potentially many others.
    (The parent label has space allocated in the header, but is not used by any
    code in libctf at present.)

This does mean that a container cannot be used both as a parent and as a child
container at the same time, because type IDs referring to types within the same
container will have their MSB turned on if this was constructed as a parent
container.  While there is a parent name and parent label in the header, it is
purely up to the CTF consumer and convention how this is interpreted: neither
libctf nor the format prohibits ctf_import()ing any container at all as a parent
container, though you should in general import the same parent at consumption
time as you did when you generated the container, or things wil misbehave.


Data type section
-----------------

This is the core section in a CTF file, an array of variable-length entries,
each entry a struct ctf_stype or struct ctf_type followed by optional
variable-length data.  Each array index is transformed into a type ID by
flipping on the MSB iff this is a parent type container.  These type IDs are how
types are referenced within CTF containers.  The ID of each type is not stored
witih the type, but is implied by its array index.

The ctf_type_t and ctf_stype_t act as a discriminated union with an identical
first few members:

typedef struct ctf_stype
{
  uint32_t ctt_name;		/* Reference to name in string table.  */
  uint32_t ctt_info;		/* Encoded kind, variant length (see below).  */
  union
  {
    uint32_t ctt_size;		/* Size of entire type in bytes.  */
    uint32_t ctt_type;		/* Reference to another type.  */
  };
} ctf_stype_t;

All types are represented by an instance of one of these structures: ctt_name is
0 for unnamed types, while ctt_info is a tiny bitfielded structure accessed via
masking:

               ------------------------
   ctt_info:   | kind | isroot | vlen |
               ------------------------
               31    26    25  24     0
where

kind: a CTF_K_* constant indicating whether this type is an int, a float, an array,
      a pointer, a structure or what-have-you (see below)
isroot: is 1 if this type has a name, 0 otherwise
vlen: the length of a kind-specific variable data region ("variant data") which
      immediately follows the ctf_stype or ctf_type structure, and contains 
      type-kind-specific properties (array length, an array of structure
      members, or whatever). The data in the vlen region is the closest thing to
      most of the attributes used by DWARF to describe types.  In general, only
      kinds for which the vlen is actually variable can be trusted to have
      useful values in this field: for all other kinds, the vlen is meaningless
      and is usually hardwwiired for that kind where needed.  ctf.h defines the
      currently-valid set of kinds:

#define CTF_K_UNKNOWN	0	/* Unknown type (used for padding).  */
#define CTF_K_INTEGER	1	/* Variant data is CTF_INT_DATA (see below).  */
#define CTF_K_FLOAT	2	/* Variant data is CTF_FP_DATA (see below).  */
#define CTF_K_POINTER	3	/* ctt_type is referenced type.  */
#define CTF_K_ARRAY	4	/* Variant data is single ctf_array_t.  */
#define CTF_K_FUNCTION	5	/* ctt_type is return type, variant data is
				   list of argument types (unsigned short's for v1,
				   uint32_t's for v2).  */
#define CTF_K_STRUCT	6	/* Variant data is list of ctf_member_t's.  */
#define CTF_K_UNION	7	/* Variant data is list of ctf_member_t's.  */
#define CTF_K_ENUM	8	/* Variant data is list of ctf_enum_t's.  */
#define CTF_K_FORWARD	9	/* No additional data; ctt_name is tag.  */
#define CTF_K_TYPEDEF	10	/* ctt_type is referenced type.  */
#define CTF_K_VOLATILE	11	/* ctt_type is base type.  */
#define CTF_K_CONST	12	/* ctt_type is base type.  */
#define CTF_K_RESTRICT	13	/* ctt_type is base type.  */
#define CTF_K_SLICE	14	/* Variant data is a ctf_slice_t.  */

#define CTF_K_MAX	63	/* Maximum possible (V2) CTF_K_* value.  */

Most of these obviously relate directly to specific C types: the only strange
one is 'slice', which allows you to take an integral type and modify its
bitness, for easy construction of bitfields (a slice of a CTF_K_ENUM is the only
way to specify an enum bitfield).


Looking at the rest of the ctf_stype_t, the ctt_size / ctt_type union is a trick
to reduce sizes. Most type-kinds that refer to another type (like pointers, or
cv-quals) have a fixed size, defined by the platform ABI (libctf calls this the
'machine model'): most types that have a variable size do not refer to another
type: all the most voluminous type kinds either do one or the other. So the
ctt_size / ctt_type contains whichever of these is applicable to the type in
question. (A few kinds, like structures or function pointers, refer to more than
one type ID: in this case, relevant type IDs are carried in the vlen data.)

For very large types the ctf_stype is not enough: the size of types can exceed
that representable by a uint32_t.  For these, we use a ctf_type_t instead:

typedef struct ctf_type
{
  uint32_t ctt_name;		/* Reference to name in string table.  */
  uint32_t ctt_info;		/* Encoded kind, variant length (see below).  */
  union
  {
    uint32_t ctt_size;		/* Always CTF_LSIZE_SENT.  */
    uint32_t ctt_type;		/* Do not use.  */
  };
  uint32_t ctt_lsizehi;		/* High 32 bits of type size in bytes.  */
  uint32_t ctt_lsizelo;		/* Low 32 bits of type size in bytes.  */
} ctf_type_t;

As noted above, this overlays on top of the ctf_stype_t, so almost all code can
just deal directly with whichever it prefers and check ctt_size to see if this
is a ctf_type or ctf_stype. You distinguish a ctf_type_t from a ctf_stype_t
because ctf_type_t has ctt_size == CTF_LSIZE_SENT (which is an invalid value for
a type ID).

Structure members use a similar trick. Almost all the time, the size of the
structure (the ctt_size) is less than 2^32 bytes, and the variable data is an
array of ctf_member_t's:

typedef struct ctf_member_v2
{
  uint32_t ctm_name;		/* Reference to name in string table.  */
  uint32_t ctm_offset;		/* Offset of this member in bits.  */
  uint32_t ctm_type;		/* Reference to type of member.  */
} ctf_member_t;

But if the structure is really huge (above CTF_LSTRUCT_THRESH bytes in length),
the ctt_size overflows the range of the ctm_offset, and every member in this
structure is instead described by the larger ctf_lmember_t:

typedef struct ctf_lmember_v2
{
  uint32_t ctlm_name;		/* Reference to name in string table.  */
  uint32_t ctlm_offsethi;	/* High 32 bits of member offset in bits.  */
  uint32_t ctlm_type;		/* Reference to type of member.  */
  uint32_t ctlm_offsetlo;	/* Low 32 bits of member offset in bits.  */
} ctf_lmember_t;

Unions are identical, and you can represent unnamed structure and union fields
as well with no extensions, by just adding members at the appropriate bit offset
in the containing struct/union (which is how unnamed structs/unions appear to
the programmer, and thus how they should appear to debuggers).


Structure members show the general theme for variant data: in most cases, the
variant data is some sort of structure, or an array of structures, or is not
present at all (things like typedefs don't have one): but function types, and
integral and floating-point types, use different sorts of vlen.  Function types
use a list of argument types with vlen / sizeof (uint32_t) members, with the
ctt_type being the return type; integer and floating-point types use flags
packed into a single uint32_t in the variant data encoding things like format,
bitness, etc:

#define CTF_INT_ENCODING(data) (((data) & 0xff000000) >> 24)
#define CTF_INT_OFFSET(data)   (((data) & 0x00ff0000) >> 16)
#define CTF_INT_BITS(data)     (((data) & 0x0000ffff))

#define CTF_INT_DATA(encoding, offset, bits) \
       (((encoding) << 24) | ((offset) << 16) | (bits))

#define CTF_INT_SIGNED	0x01	/* Integer is signed (otherwise unsigned).  */
#define CTF_INT_CHAR	0x02	/* Character display format.  */
#define CTF_INT_BOOL	0x04	/* Boolean display format.  */
#define CTF_INT_VARARGS	0x08	/* Varargs display format.  */

Or, for floats:

#define CTF_FP_ENCODING(data)  (((data) & 0xff000000) >> 24)
#define CTF_FP_OFFSET(data)    (((data) & 0x00ff0000) >> 16)
#define CTF_FP_BITS(data)      (((data) & 0x0000ffff))

#define CTF_FP_DATA(encoding, offset, bits) \
       (((encoding) << 24) | ((offset) << 16) | (bits))

/* Variant data when kind is CTF_K_FLOAT is an encoding in the top eight bits.  */
#define CTF_FP_ENCODING(data)	(((data) & 0xff000000) >> 24)

#define CTF_FP_SINGLE	1	/* IEEE 32-bit float encoding.  */
#define CTF_FP_DOUBLE	2	/* IEEE 64-bit float encoding.  */
#define CTF_FP_CPLX	3	/* Complex encoding.  */
#define CTF_FP_DCPLX	4	/* Double complex encoding.  */
#define CTF_FP_LDCPLX	5	/* Long double complex encoding.  */
#define CTF_FP_LDOUBLE	6	/* Long double encoding.  */
#define CTF_FP_INTRVL	7	/* Interval (2x32-bit) encoding.  */
#define CTF_FP_DINTRVL	8	/* Double interval (2x64-bit) encoding.  */
#define CTF_FP_LDINTRVL	9	/* Long double interval (2x128-bit) encoding.  */
#define CTF_FP_IMAGRY	10	/* Imaginary (32-bit) encoding.  */
#define CTF_FP_DIMAGRY	11	/* Long imaginary (64-bit) encoding.  */
#define CTF_FP_LDIMAGRY	12	/* Long double imaginary (128-bit) encoding.  */

#define CTF_FP_MAX	12	/* Maximum possible CTF_FP_* value */

Some of the formats, particularly in the floating-point realm, are somewhat
debatable, and we hope for discussion of what formats are appropriate (C99
complex types appear to be provided for, but not much else).

It is notable that there are two redundant ways to encode the bitness of
bitfield types, and three redundant ways to encode their offset: you can put
either directly into the encoding, or put it into a slice, or specify the offset
via bit-specific values in the containing structure or union. libctf hides as
much of this as possible by making it appear that slices are the same kind as
the kind they point to, contributing only an encoding: the only difference
between the slice and its underlying type is that you can call
ctf_type_reference() on the slice to get that underlying type, which you cannot
do on an int.

(In the header alone, but not in the data format, there is an additional
feature: the CTF_CHAR macro is an integral type of the same signedness as the
build target's char type, turning on CTF_INT_SIGNED, nor not, appropriately.)


Function info and data object sections
--------------------------------------

These two sections, taken together, map 1:1 to the symbols of type STT_OBJECT
and STT_FUNC in an ELF symbol table (usually the symbol table in the ELF object
in which the CTF section is embedded). It is generated by traversing the symbol
table, and whenever a suitable symbol is encountered, adding an entry for it
to the data object or function info sections, depending on whether this is a
STT_OBJECT or STT_FUNC symbol.

Both producer and consumer must agree on the definition of 'suitable', since
there is no cross-checking here, and if even one symbol is treated differently,
all symbols following it will be misattributed.

For both STT_FUNC and STT_OBJECT symbols, symbols that have a name that _START_
or _END_ or that is SHN_UNDEF are omitted; for STT_OBJECT symbols, we further
omit zero-valued SHN_ABS symbols.

The data object section is an array of type IDs, one entry per suitable entry in
the symbol table: each type ID is the type of the corresponding symbol.

The function object section is an array of things that (if they were in
structures rather than just a stream of bytes) would look fairly similar to the
variant data for CTF_K_FUNCTION types, described above:

uint32_t ctt_info; # vlen is number of args
ctf_id_t ctc_return;
ctf_id_t args[vlen];

If the last arg is zero, this is a varargs function, and libctf will flip on the
CTF_FUNC_VARARG flag in the funcinfo on return.


Variable info section
---------------------

This is a very simple section, an array of ctf_varent_t sorted in ascending
strcmp() order by ctv_name.  It is used for systems in which there is nothing
resembling a string table, in which address -> name lookup for data objects is
done by machinery outside the purview of CTF, and the caller wants to resolve
string names to types.  This covers data objects only: there is currently
nothing resembling the function info section with manual lookup like this.


Label section
-------------

This section is an array of ctf_lblent, which can be used to tile the type space
into named regions.  It might be useful for parallel deduplicators, or to have
distinct parent containers for different regions of the type space (with names
denoted by the label), or such things.


String table
------------

This is a perfectly normal ELF string table, with a first entry which is simply
\0 (so unnamed items can be denoted by the integer 0): it is specific to the CTF
contianer alone.  String table references in CTF have an MSB which, when 1
(CTF_STRTAB_1), means to use a specific ELF string table (usually the one
accompanying the symbol table used for the function info and data object
sections).
Nick Clifton May 1, 2019, 4:02 p.m. | #2
Hi Nick,

  One quick question before I start to dive into this particular patch series:
  Is there a reason why you are using GPLv2 and not GPLv3 for your files ?

  The binutils project as a whole uses GPLv3, so I would be concerned about
  accepting a large number of GPLv2 source files.

Cheers
  Nick
Jose E. Marchesi May 1, 2019, 4:16 p.m. | #3
Hi Nick.

      One quick question before I start to dive into this particular patch series:
      Is there a reason why you are using GPLv2 and not GPLv3 for your files ?
    
      The binutils project as a whole uses GPLv3, so I would be concerned about
      accepting a large number of GPLv2 source files.

Doh!  We actually wanted to submit as GPLv3+, not GPLv2+.  There is no
reason to include v2, at all :)

We will change the notifications to "version 3, or any later version" in
subsequent versions of the patch serie.
Joseph Myers May 2, 2019, 3:22 p.m. | #4
This patch series introduces a dependency of binutils on libctf.

This means libctf should be portable to the same range of hosts as 
binutils, all of which can be used as hosts for cross toolchains for a 
range of targets (ELF and non-ELF).  For example, it should be portable to 
hosts such as MinGW or OS X.

Some apparent portability issues in this code include:

* Use of dlfcn.h.  Such use in existing binutils code (e.g. bfd/plugin.c) 
is conditional, to avoid trying to use it on hosts without that 
functionality.

* Use of sys/mman.h.  Again, mmap usage in existing code is appropriately 
conditional.

* Use of sys/errno.h.  The standard name is errno.h.

* Use of elf.h.  Non-ELF hosts won't have such a header.  You should be 
working with the existing include/elf/*.h definitions of ELF data 
structures in binutils.

* Use of gelf.h.  This seems to be something from some versions of libelf, 
which isn't an existing build dependency of binutils at all (and given the 
existence of multiple, incompatible versions of libelf, one should be wary 
of depending on it).  The only gelf.h I have locally here is in a checkout 
of prelink sources.  Again, use existing ELF structures in headers present 
in binutils.

* Use of byteswap.h and endian.h.  Such headers are not portably 
available.  Note how byteswap.h usage in gold / elfcpp is appropriately 
conditional.

-- 
Joseph S. Myers
joseph@codesourcery.com
Nick Alcock May 3, 2019, 10:47 a.m. | #5
On 1 May 2019, Jose E. Marchesi spake thusly:

>

> Hi Nick.

>

>       One quick question before I start to dive into this particular patch series:

>       Is there a reason why you are using GPLv2 and not GPLv3 for your files ?

>     

>       The binutils project as a whole uses GPLv3, so I would be concerned about

>       accepting a large number of GPLv2 source files.

>

> Doh!  We actually wanted to submit as GPLv3+, not GPLv2+.  There is no

> reason to include v2, at all :)

> 

> We will change the notifications to "version 3, or any later version" in

> subsequent versions of the patch serie.


Adjusted. This was just historical paranoia, really. (We'll be fine as
long as Linux kernel build-time code in scripts/ can still link with it
when it's GPLv3, and since the existing prototype is linking with
elfutils now and it's only a build tool and thus is not distributed in
linked form, we're probably fine -- or in trouble already. :) )
Nick Clifton May 3, 2019, 12:33 p.m. | #6
Hi Nick,

  OK, so I am going to stop my review of this patch series until we have 
  some answers to a few high level questions/requirements.  Specifically:

  * Hosting

    Is the binutils project the right place for this code ?  As Joseph 
    has already mentioned libctf appears to use code from the elfutils 
    project, so maybe that would be a better home for libctf ?

  * Testing

    I did not see a testsuite for the library, nor any additions to
    the binutils testsuite for the extensions to objdump and readelf.
    I would not be comfortable adding such a large body of code to the
    project without some 

  * Documentation

    It would be really good to have the CTF format documented somewhere
    (semi) permanent and publicly accessible.  It would also be good if
    there was a libctf.texi document describing how consumers are expected
    to use the library, and, ideally, providing code examples.

  * Usefulness

    This may be a bit contentious - but is the CTF format actually being
    used anywhere, or likely to be used in the near future ?  If this is
    a project that is just going to be used by a small group, or even just
    a single company, then I am worried that the code will just bit rot 
    away and not be actively maintained.

Cheers
  Nick
Nick Alcock May 3, 2019, 2:23 p.m. | #7
[looking at your comments first because you were so very helpful last
 time I contributed to glibc. :) ]

(And thank you! I haven't done quite everything you suggested, at least
not yet, but the 90% I have done is entirely beneficial and you spotted
a lot of things I overlooked.)

On 2 May 2019, Joseph Myers spake thusly:

> This patch series introduces a dependency of binutils on libctf.

>

> This means libctf should be portable to the same range of hosts as 

> binutils, all of which can be used as hosts for cross toolchains for a 

> range of targets (ELF and non-ELF).  For example, it should be portable to 

> hosts such as MinGW or OS X.


Seems sensible. It might lose some functionality, though, at least to
start with. (Say, sharing string tables with the overarching container,
opening CTF files by handing them an fd to a containing binary, etc.
There are alternatives callers can use in all these cases.)

I'll probably arrange for the deduplicating linker plugin to be
ELF-only, at least to start with, because I have no way to test on
anything else, and it might always keep strings and symbols internal to
the CTF file rather than trying to share them with the enclosing binary,
until someone else contributes that sort of thing for non-ELF.

> Some apparent portability issues in this code include:

>

> * Use of dlfcn.h.  Such use in existing binutils code (e.g. bfd/plugin.c) 

> is conditional, to avoid trying to use it on hosts without that 

> functionality.


This was used by ancient code in the OpenSolaris days that endeavoured
to dlopen() zlib to avoid linking against it (why one would want to
avoid linking against zlib is opaque to me). No user these days:
dropped.

> * Use of sys/mman.h.  Again, mmap usage in existing code is appropriately 

> conditional.


We can fall back to copying or malloc in that situation, in most cases.

However, the CTF archive code would be made significantly more
complicated, more than cancelling out the implementation simplicity
which was one reason for using mmap() there in the first place. So for
now my no-mmap() CTF archive code just fails: callers can detect the
failure and fall back to storing CTF containers separately in that case.
(Both reading and writing fail symmetrically, so you aren't going to end
up creating containers you then can't read.)

If there are really still platforms relevant outside industrial museums
without mmap(), we can rethink this, but I bet there aren't, or that any
such platforms aren't going to be storing huge numbers of CTF containers
in any case. (The use case for this is if you have so many TUs that you
can't store one per section without risking blowing the 64K section
limit. Any machine new enough to be dealing with anything in that size
range is going to have mmap() as well, right? Or something we can use
instead of it with similar semantics...)


Note that it's only *creating* CTF archives without mmap() that is too
horrible to countenance. It is relatively easy to support reading CTF
archives on non-mmap-supporting systems, if quite inefficiently, so we
could arrange to fall back to read-and-copy in that case, allowing
people in cross environments to not need to worry about whether their
target supports mmap() before creating CTF archives. This might be a
reasonable middle ground, perhaps?


(Added fallbacks for mmap() in all cases but CTF archives: as noted
above, we can add fallbacks for archive usage, too, just not creation.)

(oh btw you missed a bit: we use pread() too, and badly, ignoring the
possibility of short reads or -EINTR returns. Fixing, and adding a
fallback for that as well.)

> * Use of sys/errno.h.  The standard name is errno.h.


Ancient historical wart: fixed, thank you! How did I miss that?!

> * Use of elf.h.  Non-ELF hosts won't have such a header.  You should be 

> working with the existing include/elf/*.h definitions of ELF data 

> structures in binutils.


This is all for build hosts that aren't ELF, right? I don't think we can
reasonably expect ctf_open() or ctf_fdopen() to work for anything but
raw CTF files on non-ELF hosts, given that by their very nature these
functions are getting CTF data out of ELF sections, and non-ELF formats
don't necessarily support anything like the named section concept ELF
has got at all.

The only other ELF-specificity is looking up types by symbol table
offset. Again, I don't know enough about non-ELF platforms to know if
this concept is even practical there, which would mean the data object
and function info sections would be empty on such hosts, and
ctf_lookup_by_symbol(), ctf_func_info() and ctf_func_args() would not
function or would be excluded from the set of exported symbols entirely.

This would reduce libctf's utility, but not eliminate it: external
systems can still look up types by name or CTF type ID even if they
can't do it by symbol.

It is possible that such things could be revived: all we'd need for a
given non-ELF platform would be a way to consistently split whatever
they use as a symbol table into an ordered list of data and function
objects that could be referenced by those CTF sections. However, for
now, this functionality is intrinsically ELF-only in the sense that
nobody has ever considered how it might work on non-ELF platforms and it
certainly has no users there.

However, for now we can do a little better than this: see below.

> * Use of gelf.h.  This seems to be something from some versions of libelf, 

> which isn't an existing build dependency of binutils at all (and given the 

> existence of multiple, incompatible versions of libelf, one should be wary 

> of depending on it).  The only gelf.h I have locally here is in a checkout 

> of prelink sources.  Again, use existing ELF structures in headers present 

> in binutils.


This is a historical thing: libelf was of course part of Solaris so its
usage was pervasive, even when unnecessary, as here. What we're actually
using is a few datatypes, nothing more: the Elf64_Sym, from <elf.h> (on
Linux, provided by glibc), the Elf*_GHdr and Elf*_SHdr, and the
primitive ELF-sized datatypes like Elf64_Word that those structures use.

I don't see any immediate replacement for most of this stuff in
binutils, even though I'd expect to find it: the Elf64_External_Sym's
members are entirely the wrong type (char arrays), and there doesn't
seem to be any common non-architecture-dependent structure with more
useful types at all!

Elf64_Internal_Sym is very bfd-specific (and I'm trying not to have
libctf depend on libbfd unnecessarily, since it needs little of its
functionality), and the code in readelf that mocks up an internal_sym
from file data spends almost all its time getting around the problem
that its datatypes are different from the (standard-guaranteed) data
types in the ELF file itself. This is more futzing about than seems sane
given that we're not using the rest of bfd at all.

So I'd rather find a way to do the simple 'get a bit of very simple data
out of an ELF file we have an fd to (symbol lookups and a couple of
section lookups)' without needing to rejig everything to use bfd just to
do that, particularly given that libctf's APIs that involve the caller
passing info corresponding to a section into libctf do not require the
caller to use bfd and I have not the least idea how to go from
data+size-and-no-fd to a bfd_asection (it's probably not possible).


I could just copy the (fairly small number of) relevant types from
glibc's installed elf.h header into the ctf internals (the license is
compatible, after all, as is the copyright ownership), using a different
(CTF-internal) name to avoid clashes causing trouble at build time.
Would that be acceptable?

This lets us operate unchanged on non-ELF hosts and when not targetting
ELF, and leave this code in and even functional in that situation: it
detects ELF files by their magic number, which will presumably never
match things passed in to ctf_open() on non-ELF targets, and nothing
would ever generate contents for the function info or data object
sections on such non-ELF targets either (until we figured out how to do
so), so the ELF-specific code involved in reading those sections is also
not a problem.

Adding more magic numbers for more executable types is possible: if we
started handling COFF or PE or Mach-O or something like that, we would
probably soon hit a stage where it would become useful to start using
some bfd abstractions, but I think the time is not yet. (I don't know
enough about these formats to know if they even *have* named sections.)

> * Use of byteswap.h and endian.h.  Such headers are not portably 

> available.  Note how byteswap.h usage in gold / elfcpp is appropriately 

> conditional.


Makes sense. I can easily arrange to use code like elfcpp does in that
case.

... (done.)
Florian Weimer May 3, 2019, 4:19 p.m. | #8
* Nick Alcock:

>   - a very compact association between the ELF symbol table and CTF.

>     No symbol table indexes are recorded: all are implied: the

>     data-object section is as compact as possible, containing nothing

>     but a stream of type IDs describing the type of data symbols in

>     symbol table order.


Is this for GNU/Linux?

On GNU/Linux, DWARF unwinding information (in the form of
PT_GNU_EH_FRAME) is not optional, it is required by the ABI (although
many people pretend it's not, resulting in crashes or worse).

I'm worried that we have to add both in the future, DWARF data and CTF
data, which would be rather bad.

Thanks,
Florian
Nick Alcock May 3, 2019, 7:44 p.m. | #9
On 3 May 2019, Florian Weimer verbalised:

> * Nick Alcock:

>

>>   - a very compact association between the ELF symbol table and CTF.

>>     No symbol table indexes are recorded: all are implied: the

>>     data-object section is as compact as possible, containing nothing

>>     but a stream of type IDs describing the type of data symbols in

>>     symbol table order.

>

> Is this for GNU/Linux?

>

> On GNU/Linux, DWARF unwinding information (in the form of

> PT_GNU_EH_FRAME) is not optional, it is required by the ABI (although

> many people pretend it's not, resulting in crashes or worse).

> 

> I'm worried that we have to add both in the future, DWARF data and CTF

> data, which would be rather bad.


I'm fairly sure they are quite distinct. CTF doesn't even try to record
unwinding information, or anything like it, but rather lets you
introspect into datatypes (not into the call stack, not into function
arguments, but into C types). Even the backtrace section, if and when we
add it, isn't going to do what DWARF unwinding info does: its purpose is
more to let debuggers find function arguments' values and types on the
call stack in a form much smaller than DW_TAG_call_site. Wherever
possible, if we find we need something the unwinding info is providing,
we'll point into that instead, to save space.

The one overriding rule of CTF is 'never duplicate anything you can find
elsewhere'. We already use strings from the ELF string table whenever
possible: if we ever come to need unwinding info, we'll definitely use
the DWARF unwinding info, because it is not stripped from binaries, so
its presence can be relied upon.

(Also, CTF is meant to be very small, small enough to ignore and to
leave even in stripped binaries except in very constrained environments.
I'm almost certain that it'll be *much* smaller than the DWARF unwinding
information. It certainly is in all cases I've examined so far. But
you're wise to reserve judgement on this until we have GCC and linker
support in place so you can see it for yourself.)
Florian Weimer May 6, 2019, 12:07 p.m. | #10
* Nick Alcock:

> On 3 May 2019, Florian Weimer verbalised:

>

>> * Nick Alcock:

>>

>>>   - a very compact association between the ELF symbol table and CTF.

>>>     No symbol table indexes are recorded: all are implied: the

>>>     data-object section is as compact as possible, containing nothing

>>>     but a stream of type IDs describing the type of data symbols in

>>>     symbol table order.

>>

>> Is this for GNU/Linux?

>>

>> On GNU/Linux, DWARF unwinding information (in the form of

>> PT_GNU_EH_FRAME) is not optional, it is required by the ABI (although

>> many people pretend it's not, resulting in crashes or worse).

>> 

>> I'm worried that we have to add both in the future, DWARF data and CTF

>> data, which would be rather bad.

>

> I'm fairly sure they are quite distinct. CTF doesn't even try to record

> unwinding information, or anything like it, but rather lets you

> introspect into datatypes (not into the call stack, not into function

> arguments, but into C types).


I don't know what happened on Friday.  I must have had a bad day.  (I
did have trouble backporting something that should have been a
straightforward fix, too.)  I suppose I somehow mixed up CTF and CFI
in my mind.  Sorry about that.

You are of course right, CTF isn't related to unwinding.

Thanks,
Florian
Nick Alcock May 6, 2019, 4:40 p.m. | #11
On 3 May 2019, Nick Clifton verbalised:

> Hi Nick,

>

>   OK, so I am going to stop my review of this patch series until we have 

>   some answers to a few high level questions/requirements.  Specifically:

>

>   * Hosting

>

>     Is the binutils project the right place for this code ?  As Joseph 

>     has already mentioned libctf appears to use code from the elfutils 

>     project, so maybe that would be a better home for libctf ?


It uses no code from elfutils: It uses one datatype from elfutils, and
shouldn't really even use that (it should use the stuff from <elf.h>
from glibc). (Thanks for spotting that! It'll be fixed this week.)

>   * Testing

>

>     I did not see a testsuite for the library, nor any additions to

>     the binutils testsuite for the extensions to objdump and readelf.

>     I would not be comfortable adding such a large body of code to the

>     project without some 


Agreed!!! Writing a testsuite is high on our priority list, but we need
compiler and linker support first. There will definitely be a testsuite
in place, probably in large part autogenerated. (The code is not at all
untested: the testing largely consists of generating CTF for the entire
Linux kernel and then letting the fairly sizeable DTrace testsuite run
over it. However, this is obviously not suitable for upstreaming into
binutils: something more systematic is needed, and will exist.)

>   * Documentation

>

>     It would be really good to have the CTF format documented somewhere

>     (semi) permanent and publicly accessible.  It would also be good if

>     there was a libctf.texi document describing how consumers are expected

>     to use the library, and, ideally, providing code examples.


Agreed!

>   * Usefulness

>

>     This may be a bit contentious - but is the CTF format actually being

>     used anywhere, or likely to be used in the near future ?  If this is

>     a project that is just going to be used by a small group, or even just

>     a single company, then I am worried that the code will just bit rot 

>     away and not be actively maintained.


It would be useful for many projects, but it was not easy to adopt it,
and it has been relatively unknown outside of Solaris and FreeBSD
circles.

We have been using it for many years with the Linux kernel and DTrace.
Other projects may adopt it. It is *useful* for many other projects, but
it was quite difficult to generate CTF-format data (from DWARF using a
variety of painful-to-port Solaris tools). I know I wanted it in the
past for various other profilers I was writing, but I didn't use it
because I had no idea it existed. It seems likely to be *useful* for
debuggers and tracers and introspectors of all sorts: anything that
wants to know what type an ELF object is, or what type function
arguments or return types are, and who wants to be able to decompose
that type into its constitutent pieces and chase pointers from it and
make sense of the results seems likely to be able to make use of CTF.

In particular, GDB, perf, and sysprof could definitely make use of it,
as could systemtap, rr, and honestly I'm only limited here by not
knowing the names of more obscure debugger projects. Right now these
either cannot do useful things with datatypes at all, or require all the
DWARF debuginfo to be present to do anything, and given that many of
these things are systemwide continuous debuggers, the debuginfo is
almost never present when the debugger points at a victim program and
tries to print out argument info or whatever.

I'm fairly confident that having type information for C programs as
widely and easily available as it is, for, say, Lisp programs (with a
cost, often, of only a few kilobytes) is a generally good thing. We're
making a start by adding CTF support to GCC, GNU ld and GDB.


Size-wise it really is pretty small. Let's try a few samples (with a
notably unscientific construction method). The first five numeric
columns are a count of types, and CTF sizes in bytes (the sizes of
specific sections: all uncompressed sizes but the first 'size' number.
"DWARF size" is the size of .debug_info.)

                       CTF
Program          types size (uncompressed)  stringsize  typesize   DWARF size
coreutils ls     396   10324 (26216)        13148       13068      74241
GAS 2.30         1123  55748 (172731)       106079      66652      1001453
emacs 26.1.50    3546  104276 (284479)      142231      142248     3912261
X.org 1.20.3     7266  152196 (473797)      201421      272376     4163434
GhostScript 9.26 7873  181036 (538901)      243293      295608     7943132
Gtk 3.24.7       9236  208612 (620926)      328174      292752     6106925

So a shrinkage over DWARF of roughly 80%, and usually less than 5% of
the size of the binary whose types are described. With planned format
and generation improvements, I expect that a 90% shrinkage over DWARF is
probably achievable without too much effort. (More than that might
require pruning out individual unused bits, a fairly radical change.)
Nick Alcock May 8, 2019, 11:39 a.m. | #12
On 7 May 2019, Joseph Myers said:

> On Fri, 3 May 2019, Nick Alcock wrote:

>

>> > This means libctf should be portable to the same range of hosts as 

>> > binutils, all of which can be used as hosts for cross toolchains for a 

>> > range of targets (ELF and non-ELF).  For example, it should be portable to 

>> > hosts such as MinGW or OS X.

>> 

>> Seems sensible. It might lose some functionality, though, at least to

>> start with. (Say, sharing string tables with the overarching container,

>> opening CTF files by handing them an fd to a containing binary, etc.

>> There are alternatives callers can use in all these cases.)

>

> I don't think that functionality should depend on the host, though it 

> might depend on the target.


I agree with all of that. We don't want cross-tools to generate
different output to non-cross tools, etc.

I guess that means I might *have* to figure out how to do the (very
limited) ELFish things currently being done with libelf with bfd
instead. (And I just noticed that bfd does in fact have useful texinfo,
so I can read that and most of my bfd fear should go away. I have no
*idea* why I thought it had no texinfo docs. Probably some out of date
fifteen year old memory fooling me or something.)

... ok with docs BFDizing this should be much less terrifying than I
thought.

I was also a bit worried about circular dependencies, but I don't think
there are any: we'll have {ld, binutils, gdb} -> libctf -> bfd, which is
fine, but no link from bfd back to libctf.

>> I'll probably arrange for the deduplicating linker plugin to be

>> ELF-only, at least to start with, because I have no way to test on

>

> A plugin is about the only thing I'd expect to depend on the host (in that 

> you'd need different build system logic to build a non-ELF shared object).  


Yes, except that if the host is horribly limited I'm starting to not
care about it. If mingw *does* have writable memory mappings, I'm
happy...

>> If there are really still platforms relevant outside industrial museums

>> without mmap(), we can rethink this, but I bet there aren't, or that any

>> such platforms aren't going to be storing huge numbers of CTF containers

>> in any case. (The use case for this is if you have so many TUs that you

>> can't store one per section without risking blowing the 64K section

>> limit. Any machine new enough to be dealing with anything in that size

>> range is going to have mmap() as well, right? Or something we can use

>> instead of it with similar semantics...)

>

> MinGW has similar APIs such as MapViewOfFile.  It's not mmap.


... and that's near enough I think. It looks like you can at least use
it to create files via ordinary memory I/O, just like mmap(): it has
weird restrictions but they're no harder to deal with than the
page-granularity restrictions that go with mmap() anyway. So I don't
need to rewrite all the archive code to use file I/O instead :)

>> reasonably expect ctf_open() or ctf_fdopen() to work for anything but

>> raw CTF files on non-ELF hosts, given that by their very nature these

>> functions are getting CTF data out of ELF sections, and non-ELF formats

>> don't necessarily support anything like the named section concept ELF

>> has got at all.

>

> I don't see why such functions should depend on whether the host is ELF at 

> all.  On the target, yes, but not on the host.


Oh true. I've constantly mixed up host and target for as long as I've
been using GNU tools, and I fear that after 25 years of getting it wrong
I'm not going to start getting it right any time soon. (I don't know
why. It's not as if the names are unclear.)

I did, of course, mean 'target' there, which does mean BFDizing to drop
the elfutils dependency.

I'll look at that soon. (Most of the other issues you raised I have
already fixed locally, and the fixes will be in the next patch rev.)
Michael Matz May 8, 2019, 2:34 p.m. | #13
Hello Nick,

On Fri, 3 May 2019, Nick Clifton wrote:

>   * Usefulness

> 

>     This may be a bit contentious - but is the CTF format actually being

>     used anywhere, or likely to be used in the near future ?  If this is

>     a project that is just going to be used by a small group, or even just

>     a single company, then I am worried that the code will just bit rot 

>     away and not be actively maintained.


FWIW, I'd be looking forward to have CTF in our distro binaries shipped by 
default, it might obviate the need for separate -debuginfo packages, you'd 
get somewhat useful analysis capabilities even without them then.

So, I'm very happy about someone investing into getting CTF onto linux; I 
considered doing that myself on and off, just never could bring myself 
to start ;-)


Ciao,
Michael.
Nick Clifton May 8, 2019, 4:01 p.m. | #14
Hi Nick,

  Well it looks like people do want this new feature in the binutils,
  so please go ahead and submit a v2 patch set for review.  (At your
  leisure - I am in no hurry...).

  By the way, I assume that you or somebody else will be volunteering
  to become an official maintainer for libctf, once it is in the binutils
  sources ?

Cheers
  Nick
Nick Alcock May 8, 2019, 4:20 p.m. | #15
On 8 May 2019, Nick Clifton spake thusly:

> Hi Nick,

>

>   Well it looks like people do want this new feature in the binutils,


It is not mature to jump up and down and go 'wahoo' so obviously I
didn't just do that.

>   so please go ahead and submit a v2 patch set for review.  (At your

>   leisure - I am in no hurry...).


I have some improvements to do first so you're not reviewing stuff I'm
just going to change out from under you: format changes/improvements to
finalize, etc, but I am working on nothing else and once that is done it
will be posted, don't fear. (With every review comment so far resolved,
too, I hope. That's already done bar the ELF and ctf-archive mmap
stuff...)

>   By the way, I assume that you or somebody else will be volunteering

>   to become an official maintainer for libctf, once it is in the binutils

>   sources ?


Absolutely.
Pedro Alves May 24, 2019, 8:57 a.m. | #16
On 5/3/19 3:23 PM, Nick Alcock wrote:
> 

>> * Use of elf.h.  Non-ELF hosts won't have such a header.  You should be 

>> working with the existing include/elf/*.h definitions of ELF data 

>> structures in binutils.

> This is all for build hosts that aren't ELF, right? I don't think we can

> reasonably expect ctf_open() or ctf_fdopen() to work for anything but

> raw CTF files on non-ELF hosts, given that by their very nature these

> functions are getting CTF data out of ELF sections, and non-ELF formats

> don't necessarily support anything like the named section concept ELF

> has got at all.

> 

> The only other ELF-specificity is looking up types by symbol table

> offset. Again, I don't know enough about non-ELF platforms to know if

> this concept is even practical there, which would mean the data object

> and function info sections would be empty on such hosts, and

> ctf_lookup_by_symbol(), ctf_func_info() and ctf_func_args() would not

> function or would be excluded from the set of exported symbols entirely.

> 

> This would reduce libctf's utility, but not eliminate it: external

> systems can still look up types by name or CTF type ID even if they

> can't do it by symbol.


Even if you only want to support CTF on ELF containers, a cross
binutils build hosted on e.g., Windows, targeting an ELF port, should
still be able to use the CTF.  That's why it is important to not rely
on host headers for ELF definitions.  It wasn't clear to me from
your remarks above whether the cross use case is considered?

Thanks,
Pedro Alves
Nick Alcock May 24, 2019, 4:04 p.m. | #17
On 24 May 2019, Pedro Alves spake thusly:

> On 5/3/19 3:23 PM, Nick Alcock wrote:

>>> * Use of elf.h.  Non-ELF hosts won't have such a header.  You should be 

>>> working with the existing include/elf/*.h definitions of ELF data 

>>> structures in binutils.

>> This is all for build hosts that aren't ELF, right? I don't think we can

>> reasonably expect ctf_open() or ctf_fdopen() to work for anything but

>> raw CTF files on non-ELF hosts, given that by their very nature these

>> functions are getting CTF data out of ELF sections, and non-ELF formats

>> don't necessarily support anything like the named section concept ELF

>> has got at all.

>> 

>> The only other ELF-specificity is looking up types by symbol table

>> offset. Again, I don't know enough about non-ELF platforms to know if

>> this concept is even practical there, which would mean the data object

>> and function info sections would be empty on such hosts, and

>> ctf_lookup_by_symbol(), ctf_func_info() and ctf_func_args() would not

>> function or would be excluded from the set of exported symbols entirely.

>> 

>> This would reduce libctf's utility, but not eliminate it: external

>> systems can still look up types by name or CTF type ID even if they

>> can't do it by symbol.

>

> Even if you only want to support CTF on ELF containers, a cross

> binutils build hosted on e.g., Windows, targeting an ELF port, should

> still be able to use the CTF.  That's why it is important to not rely

> on host headers for ELF definitions.  It wasn't clear to me from

> your remarks above whether the cross use case is considered?


My v2 posting (already out) and v3 (to be posted today, with a
bfdization layer that actually works and a bunch of interface changes to
make it more useful, and an example use in objdump) appear in my testing
to work on mingw, at least. I think that's a non-ELF-capable host.

Avoiding relying on host ELF headers is also important because the ELF
header I was unthinkingly relying on isn't portable outside glibc on
Linux and perhaps Solaris, and that's definitely not portable enough.

(We do still *need* a few lines of stuff from ELF headers, because the
low-level symbol lookup parts of libctf cannot rely on having a bfd
available, so we have to do symbol walkiing by hand -- but we don't need
the host to provide those few lines, and we do use BFD to get section
content etc when possible now.)
Pedro Alves May 24, 2019, 4:19 p.m. | #18
On 5/24/19 5:04 PM, Nick Alcock wrote:
> On 24 May 2019, Pedro Alves spake thusly:

>> Even if you only want to support CTF on ELF containers, a cross

>> binutils build hosted on e.g., Windows, targeting an ELF port, should

>> still be able to use the CTF.  That's why it is important to not rely

>> on host headers for ELF definitions.  It wasn't clear to me from

>> your remarks above whether the cross use case is considered?

> 

> My v2 posting (already out) and v3 (to be posted today, with a

> bfdization layer that actually works and a bunch of interface changes to

> make it more useful, and an example use in objdump) appear in my testing

> to work on mingw, at least. I think that's a non-ELF-capable host.


Yes, mingw/Windows uses COFF/PE, not ELF.

> 

> Avoiding relying on host ELF headers is also important because the ELF

> header I was unthinkingly relying on isn't portable outside glibc on

> Linux and perhaps Solaris, and that's definitely not portable enough.

> 

> (We do still *need* a few lines of stuff from ELF headers, because the

> low-level symbol lookup parts of libctf cannot rely on having a bfd

> available, so we have to do symbol walkiing by hand -- but we don't need

> the host to provide those few lines, and we do use BFD to get section

> content etc when possible now.)


Awesome.  Sorry, I'm way behind on the lists.

If everything works in a cross endianness setting too (e.g., big endian
host x little endian target) then you're golden.  :-)

Thanks,
Pedro Alves
Nick Alcock May 24, 2019, 8:09 p.m. | #19
On 24 May 2019, Pedro Alves told this:

> On 5/24/19 5:04 PM, Nick Alcock wrote:

> If everything works in a cross endianness setting too (e.g., big endian

> host x little endian target) then you're golden.  :-)


It did when I tested it last, which was some weeks ago. The ctf_archive
format is fixed-endian (little-endian): everything else byteswaps at
open time if the magic number suggests it needs to, since at open time
it's reading it all in and blowing the dcache anyway and byteswapping
everything necessary costs nothing. (And means we can ignore endianness
everywhere else in the codebase.)

Note: the original CTF format (as seen on Solaris, FreeBSD etc) is
completely endian-ignorant. It looks like they never considered
endianness at all, so it's totally nonportable to machines of opposing
endianness. Whoops...