[000/203] Refactor expressions

Message ID 20210101214723.1784144-1-tom@tromey.com
Headers show
Series
  • Refactor expressions
Related show

Message

Tom Tromey Jan. 1, 2021, 9:44 p.m.
I have long thought that struct expression is the worst data structure
in GDB.  In fact, I've sometimes told people it is the worst data
structure I've encountered in my career.  I've wanted to rewrite it
for many years, and this year I finally found a workable plan and the
motivation to do so.  This series is the result.

The new approach is what I consider to be the most obvious data
structure to use here: a base class represents a single node of an
expression tree, and concrete subclasses implement the various
operations.  Each object maintains its own storage.  In the end,
'struct expression' is just a simple wrapper around the root node of a
tree.  The weird things in the current code -- the strange way storage
is handled, and the need to "rotate" the expression after creation --
are gone.

The series is, of course, extremely long.  This is partly due to my
desire to split it up so that each patch is reasonably understandable,
but also partly because it is just unavoidable.

Conceptually it can be divided into multiple sub-series.  I didn't
submit these separately, though, as they don't really make sense
independently.  I've already submitted any patch that was truly
separate.  (These aren't all in yet, though, and this series depends
on them.)

Here are the sub-series:

1. Split functions out from the evaluators.

   Each evaluator function is fairly long, and mixes knowledge about
   the structure of the expression (how the exp_elements are laid out)
   with knowledge of the semantics of the operation.

   This part of the series, which is about 70 patches, splits out the
   semantics of most operations into their own functions.  These
   functions can be then be reused -- which is helpful for ensuring
   that changes don't sneak in.

   In this part of the series, sometimes the newly-introduced
   functions have extraneous arguments.  This was done partly so that
   some could be reused by generic code in part 2; but also partly
   because it was just simpler to write patches by rote.

2. Introduce 'class operation' and its subclasses.

   This sub-series is around 100 patches.  It introduces the new base
   class for an expression operation, and then proceeds to add
   operations for every necessary opcode.  In some cases multiple such
   operations are needed -- for example when multiple languages
   implement an opcode in different ways.

   Some discussion of the design & tradeoffs appear in the "Introduce
   class operation" patch.

3. Convert existing code to use operations.

   This is a short sub-series of around 10 patches.  Each parser is
   converted separately, as are DTrace and SystemTap probes

4. Delete obsolete code.

   The final 20 patches or so are concerned with removing all the code
   that is now obsolete, and with doing some minor tidying of the
   result.

The overall approach here is to replace the current data structure --
but only the data structure.  I didn't generally try to clean up
things that seemed a bit weird.

I also didn't try to add new features, like async evaluation.  After
starting the project, I felt that trying to combine this refactoring
with another, different one would be too difficult.

This is all on the branch t/expression-rewrite-incr in my github.

Regression tested on x86-64 Fedora 32.  I also built it on PPC Linux
to ensure that the ppc-linux-nat.c changes would build.

Tom

Comments

Joel Brobecker Jan. 3, 2021, 7:02 a.m. | #1
Hi Tom,

On Fri, Jan 01, 2021 at 02:44:00PM -0700, Tom Tromey wrote:
> I have long thought that struct expression is the worst data structure

> in GDB.  In fact, I've sometimes told people it is the worst data

> structure I've encountered in my career.  I've wanted to rewrite it

> for many years, and this year I finally found a workable plan and the

> motivation to do so.  This series is the result.


Thanks for the (mega) series!

Because of the length of the series, I think it's going to be potentially
more difficult for you to maintain it over time. I so I think it would
be nice if we could put a priority on its review.

I've now started with part 1:

> 1. Split functions out from the evaluators.

> 

>    Each evaluator function is fairly long, and mixes knowledge about

>    the structure of the expression (how the exp_elements are laid out)

>    with knowledge of the semantics of the operation.

> 

>    This part of the series, which is about 70 patches, splits out the

>    semantics of most operations into their own functions.  These

>    functions can be then be reused -- which is helpful for ensuring

>    that changes don't sneak in.

> 

>    In this part of the series, sometimes the newly-introduced

>    functions have extraneous arguments.  This was done partly so that

>    some could be reused by generic code in part 2; but also partly

>    because it was just simpler to write patches by rote.


Aside from the commit "what is this code for", I think this part
of the series is a nice improvement on its own. independently
of redesigning struct expression, I've always found the evaluator
functions to be overly long and harder to navigate as a result.
So I think it could go in ahead of the rest if we agree that
this part is good.

For me, I've gone through the patches, more or less carefully
based on a random sample, and they look good. I paused a bit
about the Ada ones, were you excluded the hanlding of noside ==
EVAL_SKIP. I'm not entirely sure why that is, perhaps because
the block consists in a goto nosideret? Looking at what that
nosideret does, it's just...

    | nosideret:
    |   return eval_skip_value (exp);

... so we could inline this in the new functions.

However, this is really a very minor detail that doesn't need to be
addressed here.

So, to summarize, if others agree, I'm happy for this part of
the series to go in.

> 

> 2. Introduce 'class operation' and its subclasses.

> 

>    This sub-series is around 100 patches.  It introduces the new base

>    class for an expression operation, and then proceeds to add

>    operations for every necessary opcode.  In some cases multiple such

>    operations are needed -- for example when multiple languages

>    implement an opcode in different ways.

> 

>    Some discussion of the design & tradeoffs appear in the "Introduce

>    class operation" patch.

> 

> 3. Convert existing code to use operations.

> 

>    This is a short sub-series of around 10 patches.  Each parser is

>    converted separately, as are DTrace and SystemTap probes

> 

> 4. Delete obsolete code.

> 

>    The final 20 patches or so are concerned with removing all the code

>    that is now obsolete, and with doing some minor tidying of the

>    result.

> 

> The overall approach here is to replace the current data structure --

> but only the data structure.  I didn't generally try to clean up

> things that seemed a bit weird.

> 

> I also didn't try to add new features, like async evaluation.  After

> starting the project, I felt that trying to combine this refactoring

> with another, different one would be too difficult.

> 

> This is all on the branch t/expression-rewrite-incr in my github.

> 

> Regression tested on x86-64 Fedora 32.  I also built it on PPC Linux

> to ensure that the ppc-linux-nat.c changes would build.

> 

> Tom

> 


-- 
Joel
Andrew Burgess Jan. 4, 2021, 12:16 p.m. | #2
* Joel Brobecker <brobecker@adacore.com> [2021-01-03 11:02:50 +0400]:

> Hi Tom,

> 

> On Fri, Jan 01, 2021 at 02:44:00PM -0700, Tom Tromey wrote:

> > I have long thought that struct expression is the worst data structure

> > in GDB.  In fact, I've sometimes told people it is the worst data

> > structure I've encountered in my career.  I've wanted to rewrite it

> > for many years, and this year I finally found a workable plan and the

> > motivation to do so.  This series is the result.

> 

> Thanks for the (mega) series!

> 

> Because of the length of the series, I think it's going to be potentially

> more difficult for you to maintain it over time. I so I think it would

> be nice if we could put a priority on its review.

> 

> I've now started with part 1:

> 

> > 1. Split functions out from the evaluators.

> > 

> >    Each evaluator function is fairly long, and mixes knowledge about

> >    the structure of the expression (how the exp_elements are laid out)

> >    with knowledge of the semantics of the operation.

> > 

> >    This part of the series, which is about 70 patches, splits out the

> >    semantics of most operations into their own functions.  These

> >    functions can be then be reused -- which is helpful for ensuring

> >    that changes don't sneak in.

> > 

> >    In this part of the series, sometimes the newly-introduced

> >    functions have extraneous arguments.  This was done partly so that

> >    some could be reused by generic code in part 2; but also partly

> >    because it was just simpler to write patches by rote.

> 

> Aside from the commit "what is this code for", I think this part

> of the series is a nice improvement on its own. independently

> of redesigning struct expression, I've always found the evaluator

> functions to be overly long and harder to navigate as a result.

> So I think it could go in ahead of the rest if we agree that

> this part is good.

> 

> For me, I've gone through the patches, more or less carefully

> based on a random sample, and they look good. I paused a bit

> about the Ada ones, were you excluded the hanlding of noside ==

> EVAL_SKIP. I'm not entirely sure why that is, perhaps because

> the block consists in a goto nosideret? Looking at what that

> nosideret does, it's just...

> 

>     | nosideret:

>     |   return eval_skip_value (exp);

> 

> ... so we could inline this in the new functions.

> 

> However, this is really a very minor detail that doesn't need to be

> addressed here.

> 

> So, to summarize, if others agree, I'm happy for this part of

> the series to go in.


I agree with this.  I looked through all the patches 001 -> 072.
There's a few missing comments that I think should be added, and I
would suggest dropping the "what is this code for" patch (if this
needs looking at it can be a separate branch), but otherwise this
first set would be a good improvement on its own and worth merging
sooner rather than later I think.

Thanks,
Andrew



> 

> > 

> > 2. Introduce 'class operation' and its subclasses.

> > 

> >    This sub-series is around 100 patches.  It introduces the new base

> >    class for an expression operation, and then proceeds to add

> >    operations for every necessary opcode.  In some cases multiple such

> >    operations are needed -- for example when multiple languages

> >    implement an opcode in different ways.

> > 

> >    Some discussion of the design & tradeoffs appear in the "Introduce

> >    class operation" patch.

> > 

> > 3. Convert existing code to use operations.

> > 

> >    This is a short sub-series of around 10 patches.  Each parser is

> >    converted separately, as are DTrace and SystemTap probes

> > 

> > 4. Delete obsolete code.

> > 

> >    The final 20 patches or so are concerned with removing all the code

> >    that is now obsolete, and with doing some minor tidying of the

> >    result.

> > 

> > The overall approach here is to replace the current data structure --

> > but only the data structure.  I didn't generally try to clean up

> > things that seemed a bit weird.

> > 

> > I also didn't try to add new features, like async evaluation.  After

> > starting the project, I felt that trying to combine this refactoring

> > with another, different one would be too difficult.

> > 

> > This is all on the branch t/expression-rewrite-incr in my github.

> > 

> > Regression tested on x86-64 Fedora 32.  I also built it on PPC Linux

> > to ensure that the ppc-linux-nat.c changes would build.

> > 

> > Tom

> > 

> 

> -- 

> Joel
Tom Tromey Feb. 13, 2021, 7:54 p.m. | #3
>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:


Joel> For me, I've gone through the patches, more or less carefully
Joel> based on a random sample, and they look good. I paused a bit
Joel> about the Ada ones, were you excluded the hanlding of noside ==
Joel> EVAL_SKIP. I'm not entirely sure why that is, perhaps because
Joel> the block consists in a goto nosideret? Looking at what that
Joel> nosideret does, it's just...

Joel>     | nosideret:
Joel>     |   return eval_skip_value (exp);

Joel> ... so we could inline this in the new functions.

There's no need for EVAL_SKIP in the new code.  This was just an
artifact of how the expression objects were laid out in memory.

It's possible that this series has intermediate states that fail in some
way.  I didn't test every patch in isolation.  I suppose I should do
that, I have just been avoiding it since on my home machine, a test run
is ~15 minutes, which works out to 50 hours for this series.

I'm ready to send v2 of the series.  I think I'll save the individual
regression test and fixes for v3.

Tom
Tom Tromey Feb. 16, 2021, 4:17 p.m. | #4
>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes:


Tom> I'm ready to send v2 of the series.  I think I'll save the individual
Tom> regression test and fixes for v3.

I wrote a script to check out each version, build it, and
regression-test it.  Then I let it run for a couple of days.

It found a few patches that did not build.  Those were all patches where
some hunk was misplaced into a later patch -- nothing serious, just
stuff like missing #includes.  I've fixed all these problems on my
branch.

It also found that there are patches that introduce regressions.  I
haven't looked into those yet.  They occur near the end of the series,
which is both (a) good and (b) expected, since the early parts of the
series are pretty innocuous.

I'll dig through these before sending v3.

Tom