Avoid excessive location expansion in assign_discriminators

Message ID alpine.LSU.2.20.1806051149210.5043@zhemvz.fhfr.qr
State New
Headers show
Series
  • Avoid excessive location expansion in assign_discriminators
Related show

Commit Message

Richard Biener June 5, 2018, 9:58 a.m.
On testcases like that from PR60243 CFG build is dominated by
assign_discriminators because it expands locations again and again
and this got more expensive over the time.

Cary - can you explain the overall logic of assign_discriminators,
specifically why if the last stmt of a block has UNKNOWN_LOCATION
we don't need any discriminator?  That last stmt inherits the last
location from a previous stmt with location.  Also why
do we look at e->dest _last_ stmt in addition to the first stmt?

If I understand correctly the assignment is done at CFG construction
time rather than location emission time because we want it done
on a representation close to source?  So if the last stmt has
the same line then the first should have as well?

Or, to ask a different question - why is this done so early on
a non-optimized CFG rather than late on RTL right before we
emit .loc directives?

It would be nice if you could expand the comment before
assign_discriminators in some way addressing the above.

The patch below cuts the time spent in assign_discriminators
down by a factor of 2.5.  There's obvious cleanup possibilities
for the pointer hash-table given we should rather embed the
int, int pair rather than allocating it on the heap.  Couldn't
figure out the appropriate hash-traits quickly enough though.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2018-06-05  Richard Biener  <rguenther@suse.de>

	* tree-cfg.c (struct locus_discrim_map): Store line, not location.
	(locus_discrim_hasher::hash): Adjust.
	(locus_discrim_hasher::equal): Likewise.
	(next_discriminator_for_locus): Work on line directly.
	(same_line_p): Pass in expanded locus1 as well.
	(assign_discriminators): Avoid redundant location expansions.

Comments

Cary Coutant June 19, 2018, 8:19 p.m. | #1
> On testcases like that from PR60243 CFG build is dominated by

> assign_discriminators because it expands locations again and again

> and this got more expensive over the time.

>

> Cary - can you explain the overall logic of assign_discriminators,

> specifically why if the last stmt of a block has UNKNOWN_LOCATION

> we don't need any discriminator?  That last stmt inherits the last

> location from a previous stmt with location.  Also why

> do we look at e->dest _last_ stmt in addition to the first stmt?


Sorry, it's been a long time since I looked at this. I think that test
for UNKNOWN_LOCATION is just a way of punting on an unexpected
condition; the rest of the function won't work unless we have a valid
locus to start with.

> If I understand correctly the assignment is done at CFG construction

> time rather than location emission time because we want it done

> on a representation close to source?  So if the last stmt has

> the same line then the first should have as well?


As for why we look at last_stmt as well as first_stmt, that has to do
with the way for loops are expanded. See my explanation here:

   https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01450.html

> Or, to ask a different question - why is this done so early on

> a non-optimized CFG rather than late on RTL right before we

> emit .loc directives?


It has to be done early because we need to have discriminators
assigned for the tree_profile pass, in order to use profile feedback
from an earlier run.

> It would be nice if you could expand the comment before

> assign_discriminators in some way addressing the above.

>

> The patch below cuts the time spent in assign_discriminators

> down by a factor of 2.5.  There's obvious cleanup possibilities

> for the pointer hash-table given we should rather embed the

> int, int pair rather than allocating it on the heap.  Couldn't

> figure out the appropriate hash-traits quickly enough though.


This looks good, except won't the hash table now compare equal for two
different locations where the line is the same, but the file is not?

In the Google branches, we improved discriminator assignment quite a
bit by tracking per instruction instead of per basic block. Here's my
original patch to do that:

   https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00563.html

Your improvement to hoist the expansion of locus_e would still be useful there.

Unfortunately, I never had the chance to update that patch to preserve
the discriminator info across LTO streaming, which is why it remained
only in the Google branches.

There were a few follow-on patches; the last backport to the
google/gcc-4_9 branch combined most of them:

   https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00869.html

and I think there was one more discriminator patch after that:

   https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02671.html

-cary
Richard Biener June 20, 2018, 7:56 a.m. | #2
On Tue, 19 Jun 2018, Cary Coutant wrote:

> > On testcases like that from PR60243 CFG build is dominated by

> > assign_discriminators because it expands locations again and again

> > and this got more expensive over the time.

> >

> > Cary - can you explain the overall logic of assign_discriminators,

> > specifically why if the last stmt of a block has UNKNOWN_LOCATION

> > we don't need any discriminator?  That last stmt inherits the last

> > location from a previous stmt with location.  Also why

> > do we look at e->dest _last_ stmt in addition to the first stmt?

> 

> Sorry, it's been a long time since I looked at this. I think that test

> for UNKNOWN_LOCATION is just a way of punting on an unexpected

> condition; the rest of the function won't work unless we have a valid

> locus to start with.

> 

> > If I understand correctly the assignment is done at CFG construction

> > time rather than location emission time because we want it done

> > on a representation close to source?  So if the last stmt has

> > the same line then the first should have as well?

> 

> As for why we look at last_stmt as well as first_stmt, that has to do

> with the way for loops are expanded. See my explanation here:

> 

>    https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01450.html


Ah, OK.  I guess together with the fact that we're working on
an unoptimized CFG that makes sense.  It basically assumes
we have no line back-and-forth jumping at this stage which I'm
not 100% sure is a valid assumption.

Likewise we shouldn't have UNKNOWN_LOCATION stmts at this point
(but again I'm quite sure there are a few cases where we do...).

> > Or, to ask a different question - why is this done so early on

> > a non-optimized CFG rather than late on RTL right before we

> > emit .loc directives?

> 

> It has to be done early because we need to have discriminators

> assigned for the tree_profile pass, in order to use profile feedback

> from an earlier run.


OK.

> > It would be nice if you could expand the comment before

> > assign_discriminators in some way addressing the above.

> >

> > The patch below cuts the time spent in assign_discriminators

> > down by a factor of 2.5.  There's obvious cleanup possibilities

> > for the pointer hash-table given we should rather embed the

> > int, int pair rather than allocating it on the heap.  Couldn't

> > figure out the appropriate hash-traits quickly enough though.

> 

> This looks good, except won't the hash table now compare equal for two

> different locations where the line is the same, but the file is not?


This is what the previous state did as well, it just compared
(and hashed) LOCATION_LINE:

inline hashval_t
 locus_discrim_hasher::hash (const locus_discrim_map *item)
 {
-  return LOCATION_LINE (item->locus);
+  return item->location_line;
 }

locus_discrim_hasher::equal (const locus_discrim_map *a,
                             const locus_discrim_map *b)
 {
-  return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus);
+  return a->location_line == b->location_line;
 }

maybe that wasn't intended (or it was changed in previous
revs from sth more sensible).

> In the Google branches, we improved discriminator assignment quite a

> bit by tracking per instruction instead of per basic block. Here's my

> original patch to do that:

> 

>    https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00563.html


That somehow suggests assigning discriminators after optimizing
would help.  I don't see discriminator being used in the profile
code in the FSF tree btw., just some comments in auto-profile.c
that it could be used.

So I guess the issue is that you need to match the CFG at profile
reading time with the line-numbers associated to the final
assembly for sample-based profiling.  That indeed sounds like
a hard^Wimpossible task.  And indeed sth on a basic-block level
isn't going to help very much.

At least it suggests that we want to assign discriminators at
the same point we'll later read the sample-based profile rather
than at CFG construction.  There are quite a number of optimizations
run, including inlining, before pass_ipa_auto_profile.

Btw, does this mean that discriminators (and the compile-time used
to compute them) are not necessary if sample-based profiling isn't
used?

Thanks,
Richard.

Patch

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 4a1b2bef570..21b3fdffa59 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -110,7 +110,7 @@  struct replace_decls_d
 /* Hash table to store last discriminator assigned for each locus.  */
 struct locus_discrim_map
 {
-  location_t locus;
+  int location_line;
   int discriminator;
 };
 
@@ -129,7 +129,7 @@  struct locus_discrim_hasher : free_ptr_hash <locus_discrim_map>
 inline hashval_t
 locus_discrim_hasher::hash (const locus_discrim_map *item)
 {
-  return LOCATION_LINE (item->locus);
+  return item->location_line;
 }
 
 /* Equality function for the locus-to-discriminator map.  A and B
@@ -139,7 +139,7 @@  inline bool
 locus_discrim_hasher::equal (const locus_discrim_map *a,
 			     const locus_discrim_map *b)
 {
-  return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus);
+  return a->location_line == b->location_line;
 }
 
 static hash_table<locus_discrim_hasher> *discriminator_per_locus;
@@ -1168,21 +1168,20 @@  gimple_find_sub_bbs (gimple_seq seq, gimple_stmt_iterator *gsi)
    profiling.  */
 
 static int
-next_discriminator_for_locus (location_t locus)
+next_discriminator_for_locus (int line)
 {
   struct locus_discrim_map item;
   struct locus_discrim_map **slot;
 
-  item.locus = locus;
+  item.location_line = line;
   item.discriminator = 0;
-  slot = discriminator_per_locus->find_slot_with_hash (
-      &item, LOCATION_LINE (locus), INSERT);
+  slot = discriminator_per_locus->find_slot_with_hash (&item, line, INSERT);
   gcc_assert (slot);
   if (*slot == HTAB_EMPTY_ENTRY)
     {
       *slot = XNEW (struct locus_discrim_map);
       gcc_assert (*slot);
-      (*slot)->locus = locus;
+      (*slot)->location_line = line;
       (*slot)->discriminator = 0;
     }
   (*slot)->discriminator++;
@@ -1192,23 +1191,22 @@  next_discriminator_for_locus (location_t locus)
 /* Return TRUE if LOCUS1 and LOCUS2 refer to the same source line.  */
 
 static bool
-same_line_p (location_t locus1, location_t locus2)
+same_line_p (location_t locus1, expanded_location *from, location_t locus2)
 {
-  expanded_location from, to;
+  expanded_location to;
 
   if (locus1 == locus2)
     return true;
 
-  from = expand_location (locus1);
   to = expand_location (locus2);
 
-  if (from.line != to.line)
+  if (from->line != to.line)
     return false;
-  if (from.file == to.file)
+  if (from->file == to.file)
     return true;
-  return (from.file != NULL
+  return (from->file != NULL
           && to.file != NULL
-          && filename_cmp (from.file, to.file) == 0);
+          && filename_cmp (from->file, to.file) == 0);
 }
 
 /* Assign discriminators to each basic block.  */
@@ -1228,17 +1226,23 @@  assign_discriminators (void)
       if (locus == UNKNOWN_LOCATION)
 	continue;
 
+      expanded_location locus_e = expand_location (locus);
+
       FOR_EACH_EDGE (e, ei, bb->succs)
 	{
 	  gimple *first = first_non_label_stmt (e->dest);
 	  gimple *last = last_stmt (e->dest);
-	  if ((first && same_line_p (locus, gimple_location (first)))
-	      || (last && same_line_p (locus, gimple_location (last))))
+	  if ((first && same_line_p (locus, &locus_e,
+				     gimple_location (first)))
+	      || (last && same_line_p (locus, &locus_e,
+				       gimple_location (last))))
 	    {
 	      if (e->dest->discriminator != 0 && bb->discriminator == 0)
-		bb->discriminator = next_discriminator_for_locus (locus);
+		bb->discriminator
+		  = next_discriminator_for_locus (locus_e.line);
 	      else
-		e->dest->discriminator = next_discriminator_for_locus (locus);
+		e->dest->discriminator
+		  = next_discriminator_for_locus (locus_e.line);
 	    }
 	}
     }