gcov: reduce code quality loss by reproducible topn merging [PR92924]

Message ID 20200130161344.GD57693@kam.mff.cuni.cz
State New
Headers show
Series
  • gcov: reduce code quality loss by reproducible topn merging [PR92924]
Related show

Commit Message

Jan Hubicka Jan. 30, 2020, 4:13 p.m.
Hi,
this patch implements more careful merging of topn values (tracking regression
we got by the reproducible profiling work).  Instead of giving up on the
counter on the overflow we use sign bits to track
 1) if there are some vlaues lost
 2) if given target was not having probability at least 1/TOPN in some run.
This makes it possible to trust more profiled values and also consumer can
decide whether he wants reproducibility at all.

Bootstrapped/regtested x86_64-linux. The patch makes small improvement to
profile precision but it will make it easy to implement command line option
turning off the profile reproducibility.  It turns out that it reduces
the precision of profile info quite a lot in practice.  In particular with
nonreproducible profiling we speculatively inline 26% more indirect calls
that without in cc1plus.

Martin, I welcome your opinion on the patch :)

lto-profile-bootstrapped/regtested x86_64-linux.  I am going to test
this on Firefox and clang and gather updated logs.

2020-01-30  Jan Hubicka  <hubicka@ucw.cz>

	PR ipa/92924
	* value-prof.c (dump_histogram_value): Update dumping.
	(stream_out_histogram_value): Do not check that values are positive for
	TOPN
	(get_nth_most_common_value): Update comment and handling of sign bits.

libgcc/ChangeLog:

2020-01-30  Jan Hubicka  <hubicka@ucw.cz>

	PR ipa/92924
	* libgcov-merge.c (merge_topn_values_set): Do not invalidate whole
	counter when it is too full, but track info in signs of counts.

Comments

Martin Liška Feb. 5, 2020, 1:54 p.m. | #1
On 1/30/20 5:13 PM, Jan Hubicka wrote:
> Martin, I welcome your opinion on the patch


Ok, recently we've made quite some experiments with Honza about
TOP N counters. Based on the numbers, it shows that tracking a negative
value per-counter value does not help much. So that I suggest to use
only one global flag (a negative total) in order to distinguish in between
reproducible and non-reproducible profile.

I'm sending patch that simplifies the merging and introduces a new option.

For the future, we may (similarly to LLVM) come up with a dynamic allocation
for TOPN counters. I've git a working patch for that.

Martin
From acf77e7ebba2a4f4370388f83901d67cc71e5a0c Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>

Date: Wed, 5 Feb 2020 14:44:43 +0100
Subject: [PATCH] Introduce -fprofile-reproducible and support it with TOP N.

gcc/ChangeLog:

2020-02-05  Martin Liska  <mliska@suse.cz>

	PR ipa/92924
	* common.opt: Add -fprofile-reproducible.
	* doc/invoke.texi: Document it.
	* value-prof.c (dump_histogram_value):
	Document and support behavior for counters[0]
	being a negative value.
	(get_nth_most_common_value): Handle negative
	counters[0] in respect to flag_profile_reproducible.

libgcc/ChangeLog:

2020-02-05  Martin Liska  <mliska@suse.cz>

	PR ipa/92924
	* libgcov-merge.c (merge_topn_values_set): Record
	when a TOP N counter becomes invalid.  When merging
	remove a smallest value if the space is needed.
---
 gcc/common.opt         |  4 ++++
 gcc/doc/invoke.texi    | 12 +++++++++-
 gcc/value-prof.c       | 26 ++++++++++++++++------
 libgcc/libgcov-merge.c | 50 +++++++++++++++++++++++++++++-------------
 4 files changed, 69 insertions(+), 23 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 5692cd04374..7f643516a62 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2168,6 +2168,10 @@ fprofile-exclude-files=
 Common Joined RejectNegative Var(flag_profile_exclude_files)
 Instrument only functions from files where names do not match all the regular expressions (separated by a semi-colon).
 
+fprofile-reproducible
+Common Report Var(flag_profile_reproducible)
+Enable only profile based optimizations that do not depend on order of training runs.
+
 Enum
 Name(profile_update) Type(enum profile_update) UnknownError(unknown profile update method %qs)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 35b341e759f..6725c543c3b 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -562,7 +562,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-abs-path @gol
 -fprofile-dir=@var{path}  -fprofile-generate  -fprofile-generate=@var{path} @gol
 -fprofile-note=@var{path}  -fprofile-update=@var{method} @gol
--fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} @gol
+-fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} -fprofile-reproducibly @gol
 -fsanitize=@var{style}  -fsanitize-recover  -fsanitize-recover=@var{style} @gol
 -fasan-shadow-offset=@var{number}  -fsanitize-sections=@var{s1},@var{s2},... @gol
 -fsanitize-undefined-trap-on-error  -fbounds-check @gol
@@ -13324,6 +13324,16 @@ all the regular expressions (separated by a semi-colon).
 For example, @option{-fprofile-exclude-files=/usr/*} will prevent instrumentation
 of all files that are located in @file{/usr/} folder.
 
+Enable only profile based optimizations (PGO) that do not depend on order of training runs.
+
+The PGO optimizations depend on a run-time profile that is always merged after
+each training run.  The merged profile can end up being different based on
+sequence of these training runs.  Using the option, the compiler will use
+only the profile information which cannot be changed by order of training runs.
+
+@item -fprofile-reproducible
+@opindex fprofile-reproducible
+
 @item -fsanitize=address
 @opindex fsanitize=address
 Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index f0456c8e93d..64b9c9de6dd 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -265,8 +265,10 @@ dump_histogram_value (FILE *dump_file, histogram_value hist)
 		    ? "Top N value counter" : "Indirect call counter"));
 	  if (hist->hvalue.counters)
 	    {
-	      fprintf (dump_file, " all: %" PRId64 ", values: ",
-		       (int64_t) hist->hvalue.counters[0]);
+	      fprintf (dump_file, " all: %" PRId64 "%s, values: ",
+		       abs ((int64_t) hist->hvalue.counters[0]),
+		       hist->hvalue.counters[0] < 0
+		       ? " (values missing)": "");
 	      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
 		{
 		  fprintf (dump_file, "[%" PRId64 ":%" PRId64 "]",
@@ -719,26 +721,36 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,
 
 /* Return the n-th value count of TOPN_VALUE histogram.  If
    there's a value, return true and set VALUE and COUNT
-   arguments.  */
+   arguments.
+
+   Counters have the following meaning.
+
+   abs (counters[0]) is the number of executions
+   for i in 0 ... TOPN-1
+     counters[2 * i + 1] is target
+     abs (counters[2 * i + 2]) is corresponding hitrate counter.
+
+   Value of counters[0] negative when counter became
+   full during merging and some values are lost.  */
 
 bool
 get_nth_most_common_value (gimple *stmt, const char *counter_type,
 			   histogram_value hist, gcov_type *value,
 			   gcov_type *count, gcov_type *all, unsigned n)
 {
-  if (hist->hvalue.counters[2] == -1)
-    return false;
-
   gcc_assert (n < GCOV_TOPN_VALUES);
 
   *count = 0;
   *value = 0;
 
-  gcov_type read_all = hist->hvalue.counters[0];
+  gcov_type read_all = abs (hist->hvalue.counters[0]);
 
   gcov_type v = hist->hvalue.counters[2 * n + 1];
   gcov_type c = hist->hvalue.counters[2 * n + 2];
 
+  if (flag_profile_reproducible && hist->hvalue.counters[0] < 0)
+    return false;
+
   /* Indirect calls can't be verified.  */
   if (stmt
       && check_counter (stmt, counter_type, &c, &read_all,
diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index 19b8ee72ae9..c0785b0bf10 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -86,36 +86,47 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
 
 #ifdef L_gcov_merge_topn
 
+/* To merging of TOPN profiles.
+   counters[0] is the number of executions
+   for i in 0 ... TOPN-1
+     counters[2 * i + 1] is target
+     counters[2 * i + 2] is corresponding hitrate counter.
+
+   Because we prune counters only those with probability >= 1/TOPN are
+   present now.
+
+   We use sign of counters[0] to track whether the number of different
+   targets exceeds TOPN.  */
+
 static void
 merge_topn_values_set (gcov_type *counters)
 {
   /* First value is number of total executions of the profiler.  */
-  gcov_type all = gcov_get_counter_ignore_scaling (-1);
-  counters[0] += all;
+  gcov_type all = gcov_get_counter ();
+  gcov_type *total = &counters[0];
   ++counters;
 
+  /* Negative value means that counter is missing some of values.  */
+  if (all < 0)
+    *total = -(*total);
+
+  *total += all;
+
   /* Read all part values.  */
   gcov_type read_counters[2 * GCOV_TOPN_VALUES];
-
   for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       read_counters[2 * i] = gcov_get_counter_target ();
       read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
     }
 
-  if (read_counters[1] == -1)
-    {
-      counters[1] = -1;
-      return;
-    }
-
   for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       if (read_counters[2 * i + 1] == 0)
 	continue;
 
       unsigned j;
-      int slot = -1;
+      int slot = 0;
 
       for (j = 0; j < GCOV_TOPN_VALUES; j++)
 	{
@@ -124,13 +135,15 @@ merge_topn_values_set (gcov_type *counters)
 	      counters[2 * j + 1] += read_counters[2 * i + 1];
 	      break;
 	    }
-	  else if (counters[2 * j + 1] == 0)
+	  else if (counters[2 * j + 1] < counters[2 * slot + 1])
 	    slot = j;
 	}
 
       if (j == GCOV_TOPN_VALUES)
 	{
-	  if (slot > 0)
+	  gcov_type slot_count = counters[2 * slot + 1];
+	  /* We found an empty slot.  */
+	  if (slot_count == 0)
 	    {
 	      /* If we found empty slot, add the value.  */
 	      counters[2 * slot] = read_counters[2 * i];
@@ -138,9 +151,16 @@ merge_topn_values_set (gcov_type *counters)
 	    }
 	  else
 	    {
-	      /* We haven't found a slot, bail out.  */
-	      counters[1] = -1;
-	      return;
+	      /* Here we are loosing some values.  */
+	      if (*total >= 0)
+		*total = -(*total);
+	      if (read_counters[2 * i + 1] > slot_count)
+		{
+		  counters[2 * slot] = read_counters[2 * i];
+		  counters[2 * slot + 1] = read_counters[2 * i + 1];
+		}
+	      else
+		counters[2 * slot + 1] -= read_counters[2 * i + 1];
 	    }
 	}
     }
-- 
2.25.0
Jan Hubicka Feb. 13, 2020, 1:35 p.m. | #2
> On 1/30/20 5:13 PM, Jan Hubicka wrote:

> > Martin, I welcome your opinion on the patch

> 

> Ok, recently we've made quite some experiments with Honza about

> TOP N counters. Based on the numbers, it shows that tracking a negative

> value per-counter value does not help much. So that I suggest to use

> only one global flag (a negative total) in order to distinguish in between

> reproducible and non-reproducible profile.

> 

> I'm sending patch that simplifies the merging and introduces a new option.

> 

> For the future, we may (similarly to LLVM) come up with a dynamic allocation

> for TOPN counters. I've git a working patch for that.

> 

> Martin


> From acf77e7ebba2a4f4370388f83901d67cc71e5a0c Mon Sep 17 00:00:00 2001

> From: Martin Liska <mliska@suse.cz>

> Date: Wed, 5 Feb 2020 14:44:43 +0100

> Subject: [PATCH] Introduce -fprofile-reproducible and support it with TOP N.

> 

> gcc/ChangeLog:

> 

> 2020-02-05  Martin Liska  <mliska@suse.cz>

> 

> 	PR ipa/92924

> 	* common.opt: Add -fprofile-reproducible.

> 	* doc/invoke.texi: Document it.

> 	* value-prof.c (dump_histogram_value):

> 	Document and support behavior for counters[0]

> 	being a negative value.

> 	(get_nth_most_common_value): Handle negative

> 	counters[0] in respect to flag_profile_reproducible.

> 

> libgcc/ChangeLog:

> 

> 2020-02-05  Martin Liska  <mliska@suse.cz>

> 

> 	PR ipa/92924

> 	* libgcov-merge.c (merge_topn_values_set): Record

> 	when a TOP N counter becomes invalid.  When merging

> 	remove a smallest value if the space is needed.


I think the patch is fine except for the command line option name.
First the profiling is reproducible as long as you do not run streaming
in parallel and second we will probably eventually run into need of some
kind of reproducibility for multi-threade dprograms.

So what about
 -fprofile-reproducibility=serial
 -fprofile-reproducibility=parallel-runs (or parallel-merging)
and in future
 -fprofile-reproducibility=multithreaded

It will make users to read manual what it odes really mean.
> ---

>  gcc/common.opt         |  4 ++++

>  gcc/doc/invoke.texi    | 12 +++++++++-

>  gcc/value-prof.c       | 26 ++++++++++++++++------

>  libgcc/libgcov-merge.c | 50 +++++++++++++++++++++++++++++-------------

>  4 files changed, 69 insertions(+), 23 deletions(-)

> 

> diff --git a/gcc/common.opt b/gcc/common.opt

> index 5692cd04374..7f643516a62 100644

> --- a/gcc/common.opt

> +++ b/gcc/common.opt

> @@ -2168,6 +2168,10 @@ fprofile-exclude-files=

>  Common Joined RejectNegative Var(flag_profile_exclude_files)

>  Instrument only functions from files where names do not match all the regular expressions (separated by a semi-colon).

>  

> +fprofile-reproducible

> +Common Report Var(flag_profile_reproducible)

> +Enable only profile based optimizations that do not depend on order of training runs.

> +

>  Enum

>  Name(profile_update) Type(enum profile_update) UnknownError(unknown profile update method %qs)

>  

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

> index 35b341e759f..6725c543c3b 100644

> --- a/gcc/doc/invoke.texi

> +++ b/gcc/doc/invoke.texi

> @@ -562,7 +562,7 @@ Objective-C and Objective-C++ Dialects}.

>  -fprofile-abs-path @gol

>  -fprofile-dir=@var{path}  -fprofile-generate  -fprofile-generate=@var{path} @gol

>  -fprofile-note=@var{path}  -fprofile-update=@var{method} @gol

> --fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} @gol

> +-fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} -fprofile-reproducibly @gol

>  -fsanitize=@var{style}  -fsanitize-recover  -fsanitize-recover=@var{style} @gol

>  -fasan-shadow-offset=@var{number}  -fsanitize-sections=@var{s1},@var{s2},... @gol

>  -fsanitize-undefined-trap-on-error  -fbounds-check @gol

> @@ -13324,6 +13324,16 @@ all the regular expressions (separated by a semi-colon).

>  For example, @option{-fprofile-exclude-files=/usr/*} will prevent instrumentation

>  of all files that are located in @file{/usr/} folder.

>  

> +Enable only profile based optimizations (PGO) that do not depend on order of training runs.

> +

> +The PGO optimizations depend on a run-time profile that is always merged after

> +each training run.  The merged profile can end up being different based on

> +sequence of these training runs.  Using the option, the compiler will use

> +only the profile information which cannot be changed by order of training runs.

> +

> +@item -fprofile-reproducible

> +@opindex fprofile-reproducible


Isn't this backwards (i.e. @item first and descrition later).
I think this is bit hard to understand feature so we should explain it
bit more including the surprises one can run into WRT reproducibility.

-fprofile-reproducibility=
Control level of reproducibility of profile gathered by
@code{-fprofile-generate}.  This makes it possible to rebuild program
with same outcome which is useful, for example, for distribution
packages.

With @option{-fprofile-reproducibility=serial} the profile gathered by
@option{-fprofile-generate} is reproducible provided the trained program
behaves the same at each invocation of the train run, it is not
multi-threaded and profile data streaming is always done in the same
order. Note that profile streaming happens at the end of program run but
also before @code{fork} function is invoked.  

Note that it is quite common that execution counts of some part of
programs depends, for example, on length of temporary file names or
memory space randomization (that may affect hash-table collision rate).
Such non-reproducible part of programs may be annotated by
@code{no_instrument_function} function attribute. @code{gcov-dump} with
@option{-l} can be used to dump gathered data and verify that they are
indeed reproducible.

With @option{-fprofile-reproducibility=parallel-runs} collected profile
stays reproducible regardless the order of streaming of the data into
gcda files.  This setting makes it possible to run multiple instances of
instrumente program in parallel (such as with @code{make -j}). This
reduces quality of gathered data, in particular of indirect call
profiling.

Otherwise the patch is OK and thanks for working on this!
Honza
> +

>  @item -fsanitize=address

>  @opindex fsanitize=address

>  Enable AddressSanitizer, a fast memory error detector.

> diff --git a/gcc/value-prof.c b/gcc/value-prof.c

> index f0456c8e93d..64b9c9de6dd 100644

> --- a/gcc/value-prof.c

> +++ b/gcc/value-prof.c

> @@ -265,8 +265,10 @@ dump_histogram_value (FILE *dump_file, histogram_value hist)

>  		    ? "Top N value counter" : "Indirect call counter"));

>  	  if (hist->hvalue.counters)

>  	    {

> -	      fprintf (dump_file, " all: %" PRId64 ", values: ",

> -		       (int64_t) hist->hvalue.counters[0]);

> +	      fprintf (dump_file, " all: %" PRId64 "%s, values: ",

> +		       abs ((int64_t) hist->hvalue.counters[0]),

> +		       hist->hvalue.counters[0] < 0

> +		       ? " (values missing)": "");

>  	      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)

>  		{

>  		  fprintf (dump_file, "[%" PRId64 ":%" PRId64 "]",

> @@ -719,26 +721,36 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,

>  

>  /* Return the n-th value count of TOPN_VALUE histogram.  If

>     there's a value, return true and set VALUE and COUNT

> -   arguments.  */

> +   arguments.

> +

> +   Counters have the following meaning.

> +

> +   abs (counters[0]) is the number of executions

> +   for i in 0 ... TOPN-1

> +     counters[2 * i + 1] is target

> +     abs (counters[2 * i + 2]) is corresponding hitrate counter.

> +

> +   Value of counters[0] negative when counter became

> +   full during merging and some values are lost.  */

>  

>  bool

>  get_nth_most_common_value (gimple *stmt, const char *counter_type,

>  			   histogram_value hist, gcov_type *value,

>  			   gcov_type *count, gcov_type *all, unsigned n)

>  {

> -  if (hist->hvalue.counters[2] == -1)

> -    return false;

> -

>    gcc_assert (n < GCOV_TOPN_VALUES);

>  

>    *count = 0;

>    *value = 0;

>  

> -  gcov_type read_all = hist->hvalue.counters[0];

> +  gcov_type read_all = abs (hist->hvalue.counters[0]);

>  

>    gcov_type v = hist->hvalue.counters[2 * n + 1];

>    gcov_type c = hist->hvalue.counters[2 * n + 2];

>  

> +  if (flag_profile_reproducible && hist->hvalue.counters[0] < 0)

> +    return false;

> +

>    /* Indirect calls can't be verified.  */

>    if (stmt

>        && check_counter (stmt, counter_type, &c, &read_all,

> diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c

> index 19b8ee72ae9..c0785b0bf10 100644

> --- a/libgcc/libgcov-merge.c

> +++ b/libgcc/libgcov-merge.c

> @@ -86,36 +86,47 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)

>  

>  #ifdef L_gcov_merge_topn

>  

> +/* To merging of TOPN profiles.

> +   counters[0] is the number of executions

> +   for i in 0 ... TOPN-1

> +     counters[2 * i + 1] is target

> +     counters[2 * i + 2] is corresponding hitrate counter.

> +

> +   Because we prune counters only those with probability >= 1/TOPN are

> +   present now.

> +

> +   We use sign of counters[0] to track whether the number of different

> +   targets exceeds TOPN.  */

> +

>  static void

>  merge_topn_values_set (gcov_type *counters)

>  {

>    /* First value is number of total executions of the profiler.  */

> -  gcov_type all = gcov_get_counter_ignore_scaling (-1);

> -  counters[0] += all;

> +  gcov_type all = gcov_get_counter ();

> +  gcov_type *total = &counters[0];

>    ++counters;

>  

> +  /* Negative value means that counter is missing some of values.  */

> +  if (all < 0)

> +    *total = -(*total);

> +

> +  *total += all;

> +

>    /* Read all part values.  */

>    gcov_type read_counters[2 * GCOV_TOPN_VALUES];

> -

>    for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)

>      {

>        read_counters[2 * i] = gcov_get_counter_target ();

>        read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);

>      }

>  

> -  if (read_counters[1] == -1)

> -    {

> -      counters[1] = -1;

> -      return;

> -    }

> -

>    for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)

>      {

>        if (read_counters[2 * i + 1] == 0)

>  	continue;

>  

>        unsigned j;

> -      int slot = -1;

> +      int slot = 0;

>  

>        for (j = 0; j < GCOV_TOPN_VALUES; j++)

>  	{

> @@ -124,13 +135,15 @@ merge_topn_values_set (gcov_type *counters)

>  	      counters[2 * j + 1] += read_counters[2 * i + 1];

>  	      break;

>  	    }

> -	  else if (counters[2 * j + 1] == 0)

> +	  else if (counters[2 * j + 1] < counters[2 * slot + 1])

>  	    slot = j;

>  	}

>  

>        if (j == GCOV_TOPN_VALUES)

>  	{

> -	  if (slot > 0)

> +	  gcov_type slot_count = counters[2 * slot + 1];

> +	  /* We found an empty slot.  */

> +	  if (slot_count == 0)

>  	    {

>  	      /* If we found empty slot, add the value.  */

>  	      counters[2 * slot] = read_counters[2 * i];

> @@ -138,9 +151,16 @@ merge_topn_values_set (gcov_type *counters)

>  	    }

>  	  else

>  	    {

> -	      /* We haven't found a slot, bail out.  */

> -	      counters[1] = -1;

> -	      return;

> +	      /* Here we are loosing some values.  */

> +	      if (*total >= 0)

> +		*total = -(*total);

> +	      if (read_counters[2 * i + 1] > slot_count)

> +		{

> +		  counters[2 * slot] = read_counters[2 * i];

> +		  counters[2 * slot + 1] = read_counters[2 * i + 1];

> +		}

> +	      else

> +		counters[2 * slot + 1] -= read_counters[2 * i + 1];

>  	    }

>  	}

>      }

> -- 

> 2.25.0

>
Martin Liška Feb. 17, 2020, 9:47 a.m. | #3
On 2/13/20 2:35 PM, Jan Hubicka wrote:
> Otherwise the patch is OK and thanks for working on this!

> Honza


Hello.

There's updated version of the patch that comes with the
new option name.

If there's no comment I'm going to install it tomorrow.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thanks,
Martin
From c1d19157c6683e2b28abc93bc3bedd4771bdb9a5 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>

Date: Wed, 5 Feb 2020 14:44:43 +0100
Subject: [PATCH] Introduce -fprofile-reproducibility and support it with TOP
 N.

gcc/ChangeLog:

2020-02-05  Martin Liska  <mliska@suse.cz>

	PR ipa/92924
	* common.opt: Add -fprofile-reproducibility.
	* doc/invoke.texi: Document it.
	* value-prof.c (dump_histogram_value):
	Document and support behavior for counters[0]
	being a negative value.
	(get_nth_most_common_value): Handle negative
	counters[0] in respect to flag_profile_reproducible.

libgcc/ChangeLog:

2020-02-05  Martin Liska  <mliska@suse.cz>

	PR ipa/92924
	* libgcov-merge.c (merge_topn_values_set): Record
	when a TOP N counter becomes invalid.  When merging
	remove a smallest value if the space is needed.
---
 gcc/common.opt         | 16 ++++++++++++++
 gcc/coretypes.h        |  7 ++++++
 gcc/doc/invoke.texi    | 31 +++++++++++++++++++++++++-
 gcc/value-prof.c       | 29 ++++++++++++++++++------
 libgcc/libgcov-merge.c | 50 +++++++++++++++++++++++++++++-------------
 5 files changed, 110 insertions(+), 23 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 5692cd04374..fa9da505fc2 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2168,6 +2168,22 @@ fprofile-exclude-files=
 Common Joined RejectNegative Var(flag_profile_exclude_files)
 Instrument only functions from files where names do not match all the regular expressions (separated by a semi-colon).
 
+Enum
+Name(profile_reproducibility) Type(enum profile_reproducibility) UnknownError(unknown profile reproducibility method %qs)
+
+EnumValue
+Enum(profile_reproducibility) String(serial) Value(PROFILE_REPRODUCIBILITY_SERIAL)
+
+EnumValue
+Enum(profile_reproducibility) String(parallel-runs) Value(PROFILE_REPRODUCIBILITY_PARALLEL_RUNS)
+
+EnumValue
+Enum(profile_reproducibility) String(multithreaded) Value(PROFILE_REPRODUCIBILITY_MULTITHREADED)
+
+fprofile-reproducible
+Common Joined RejectNegative Var(flag_profile_reproducible) Enum(profile_reproducibility) Init(PROFILE_REPRODUCIBILITY_SERIAL)
+-fprofile-reproducible=[serial|parallel-runs|multithreaded] Control level of reproducibility of profile gathered by -fprofile-generate.
+
 Enum
 Name(profile_update) Type(enum profile_update) UnknownError(unknown profile update method %qs)
 
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index d8fd50d79a4..cda22697cc3 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -212,6 +212,13 @@ enum profile_update {
   PROFILE_UPDATE_PREFER_ATOMIC
 };
 
+/* Type of profile reproducibility methods.  */
+enum profile_reproducibility {
+    PROFILE_REPRODUCIBILITY_SERIAL,
+    PROFILE_REPRODUCIBILITY_PARALLEL_RUNS,
+    PROFILE_REPRODUCIBILITY_MULTITHREADED
+};
+
 /* Types of unwind/exception handling info that can be generated.  */
 
 enum unwind_info_type
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3e47d06f0d5..ba2b7e42520 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -562,7 +562,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-abs-path @gol
 -fprofile-dir=@var{path}  -fprofile-generate  -fprofile-generate=@var{path} @gol
 -fprofile-note=@var{path}  -fprofile-update=@var{method} @gol
--fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} @gol
+-fprofile-filter-files=@var{regex}  -fprofile-exclude-files=@var{regex} -fprofile-reproducibility @gol
 -fsanitize=@var{style}  -fsanitize-recover  -fsanitize-recover=@var{style} @gol
 -fasan-shadow-offset=@var{number}  -fsanitize-sections=@var{s1},@var{s2},... @gol
 -fsanitize-undefined-trap-on-error  -fbounds-check @gol
@@ -13360,6 +13360,35 @@ all the regular expressions (separated by a semi-colon).
 For example, @option{-fprofile-exclude-files=/usr/*} will prevent instrumentation
 of all files that are located in @file{/usr/} folder.
 
+@item -fprofile-reproducible
+@opindex fprofile-reproducible
+Control level of reproducibility of profile gathered by
+@code{-fprofile-generate}.  This makes it possible to rebuild program
+with same outcome which is useful, for example, for distribution
+packages.
+
+With @option{-fprofile-reproducibility=serial} the profile gathered by
+@option{-fprofile-generate} is reproducible provided the trained program
+behaves the same at each invocation of the train run, it is not
+multi-threaded and profile data streaming is always done in the same
+order.  Note that profile streaming happens at the end of program run but
+also before @code{fork} function is invoked.
+
+Note that it is quite common that execution counts of some part of
+programs depends, for example, on length of temporary file names or
+memory space randomization (that may affect hash-table collision rate).
+Such non-reproducible part of programs may be annotated by
+@code{no_instrument_function} function attribute. @code{gcov-dump} with
+@option{-l} can be used to dump gathered data and verify that they are
+indeed reproducible.
+
+With @option{-fprofile-reproducibility=parallel-runs} collected profile
+stays reproducible regardless the order of streaming of the data into
+gcda files.  This setting makes it possible to run multiple instances of
+instrumented program in parallel (such as with @code{make -j}). This
+reduces quality of gathered data, in particular of indirect call
+profiling.
+
 @item -fsanitize=address
 @opindex fsanitize=address
 Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index f0456c8e93d..5f940f40399 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -265,8 +265,10 @@ dump_histogram_value (FILE *dump_file, histogram_value hist)
 		    ? "Top N value counter" : "Indirect call counter"));
 	  if (hist->hvalue.counters)
 	    {
-	      fprintf (dump_file, " all: %" PRId64 ", values: ",
-		       (int64_t) hist->hvalue.counters[0]);
+	      fprintf (dump_file, " all: %" PRId64 "%s, values: ",
+		       abs ((int64_t) hist->hvalue.counters[0]),
+		       hist->hvalue.counters[0] < 0
+		       ? " (values missing)": "");
 	      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
 		{
 		  fprintf (dump_file, "[%" PRId64 ":%" PRId64 "]",
@@ -719,26 +721,39 @@ gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,
 
 /* Return the n-th value count of TOPN_VALUE histogram.  If
    there's a value, return true and set VALUE and COUNT
-   arguments.  */
+   arguments.
+
+   Counters have the following meaning.
+
+   abs (counters[0]) is the number of executions
+   for i in 0 ... TOPN-1
+     counters[2 * i + 1] is target
+     abs (counters[2 * i + 2]) is corresponding hitrate counter.
+
+   Value of counters[0] negative when counter became
+   full during merging and some values are lost.  */
 
 bool
 get_nth_most_common_value (gimple *stmt, const char *counter_type,
 			   histogram_value hist, gcov_type *value,
 			   gcov_type *count, gcov_type *all, unsigned n)
 {
-  if (hist->hvalue.counters[2] == -1)
-    return false;
-
   gcc_assert (n < GCOV_TOPN_VALUES);
 
   *count = 0;
   *value = 0;
 
-  gcov_type read_all = hist->hvalue.counters[0];
+  gcov_type read_all = abs (hist->hvalue.counters[0]);
 
   gcov_type v = hist->hvalue.counters[2 * n + 1];
   gcov_type c = hist->hvalue.counters[2 * n + 2];
 
+  if (hist->hvalue.counters[0] < 0
+      && (flag_profile_reproducible == PROFILE_REPRODUCIBILITY_PARALLEL_RUNS
+	  || (flag_profile_reproducible
+	      == PROFILE_REPRODUCIBILITY_MULTITHREADED)))
+    return false;
+
   /* Indirect calls can't be verified.  */
   if (stmt
       && check_counter (stmt, counter_type, &c, &read_all,
diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index 19b8ee72ae9..c0785b0bf10 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -86,36 +86,47 @@ __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
 
 #ifdef L_gcov_merge_topn
 
+/* To merging of TOPN profiles.
+   counters[0] is the number of executions
+   for i in 0 ... TOPN-1
+     counters[2 * i + 1] is target
+     counters[2 * i + 2] is corresponding hitrate counter.
+
+   Because we prune counters only those with probability >= 1/TOPN are
+   present now.
+
+   We use sign of counters[0] to track whether the number of different
+   targets exceeds TOPN.  */
+
 static void
 merge_topn_values_set (gcov_type *counters)
 {
   /* First value is number of total executions of the profiler.  */
-  gcov_type all = gcov_get_counter_ignore_scaling (-1);
-  counters[0] += all;
+  gcov_type all = gcov_get_counter ();
+  gcov_type *total = &counters[0];
   ++counters;
 
+  /* Negative value means that counter is missing some of values.  */
+  if (all < 0)
+    *total = -(*total);
+
+  *total += all;
+
   /* Read all part values.  */
   gcov_type read_counters[2 * GCOV_TOPN_VALUES];
-
   for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       read_counters[2 * i] = gcov_get_counter_target ();
       read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
     }
 
-  if (read_counters[1] == -1)
-    {
-      counters[1] = -1;
-      return;
-    }
-
   for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
       if (read_counters[2 * i + 1] == 0)
 	continue;
 
       unsigned j;
-      int slot = -1;
+      int slot = 0;
 
       for (j = 0; j < GCOV_TOPN_VALUES; j++)
 	{
@@ -124,13 +135,15 @@ merge_topn_values_set (gcov_type *counters)
 	      counters[2 * j + 1] += read_counters[2 * i + 1];
 	      break;
 	    }
-	  else if (counters[2 * j + 1] == 0)
+	  else if (counters[2 * j + 1] < counters[2 * slot + 1])
 	    slot = j;
 	}
 
       if (j == GCOV_TOPN_VALUES)
 	{
-	  if (slot > 0)
+	  gcov_type slot_count = counters[2 * slot + 1];
+	  /* We found an empty slot.  */
+	  if (slot_count == 0)
 	    {
 	      /* If we found empty slot, add the value.  */
 	      counters[2 * slot] = read_counters[2 * i];
@@ -138,9 +151,16 @@ merge_topn_values_set (gcov_type *counters)
 	    }
 	  else
 	    {
-	      /* We haven't found a slot, bail out.  */
-	      counters[1] = -1;
-	      return;
+	      /* Here we are loosing some values.  */
+	      if (*total >= 0)
+		*total = -(*total);
+	      if (read_counters[2 * i + 1] > slot_count)
+		{
+		  counters[2 * slot] = read_counters[2 * i];
+		  counters[2 * slot + 1] = read_counters[2 * i + 1];
+		}
+	      else
+		counters[2 * slot + 1] -= read_counters[2 * i + 1];
 	    }
 	}
     }
-- 
2.25.0
Martin Liška Feb. 18, 2020, 3:31 p.m. | #4
On 1/30/20 5:13 PM, Jan Hubicka wrote:
> @@ -330,7 +334,7 @@

> 

>   stream_out_histogram_value (struct output_block *ob, histogram_value hist)

> 

> /* When user uses an unsigned type with a big value, constant converted

> to gcov_type (a signed type) can be negative. */

> gcov_type value = hist->hvalue.counters[i];

> - if (hist->type == HIST_TYPE_TOPN_VALUES && i > 0)

> + if (hist->type == HIST_TYPE_TOPN_VALUES)

> ;

> else

> gcc_assert (value >= 0);


I forgot to include this obvious hunk in my installed patch and
I noticed that during LTO PGO bootstrap.

Let's fix it.
Martin
From 5c72754ab734d9b43db19d51ce196d582d85002c Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>

Date: Tue, 18 Feb 2020 16:18:32 +0100
Subject: [PATCH] Restore LTO PGO bootstrap after ea0b12523d0d9a9059b5.

gcc/ChangeLog:

2020-02-18  Martin Liska  <mliska@suse.cz>

	* value-prof.c (stream_out_histogram_value): Restore LTO PGO
	bootstrap by missing removal of invalid sanity check.
---
 gcc/value-prof.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 5f940f40399..8e9f129708a 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -332,7 +332,7 @@ stream_out_histogram_value (struct output_block *ob, histogram_value hist)
       /* When user uses an unsigned type with a big value, constant converted
 	 to gcov_type (a signed type) can be negative.  */
       gcov_type value = hist->hvalue.counters[i];
-      if (hist->type == HIST_TYPE_TOPN_VALUES && i > 0)
+      if (hist->type == HIST_TYPE_TOPN_VALUES)
 	;
       else
 	gcc_assert (value >= 0);
-- 
2.25.0
Gerald Pfeifer Feb. 24, 2020, 9:28 a.m. | #5
On Mon, 17 Feb 2020, Martin Liška wrote:
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.


This (or rather its predecessor?) breaks bootstrap on 32-bit 
i386-unknown-freebsd11.3.

/scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c: In function 'void dump_histogram_value(FILE*, histogram_value)':
/scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c:268:28: error: format '%lld' expects argument of type 'long long int', but argument 3 hastype 'int' [-Werror=format=]
  268 |        fprintf (dump_file, " all: %" PRId64 "%s, values: ",
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  269 |          abs ((int64_t) hist->hvalue.counters[0]),
      |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |              |
      |              int

(I'm not sure why my nightly tester has not caught this, but only
the snapshot did.)

Gerald
Gerald Pfeifer Feb. 27, 2020, 8:19 p.m. | #6
On Mon, 24 Feb 2020, Gerald Pfeifer wrote:
> This (or rather its predecessor?) breaks bootstrap on 32-bit 

> i386-unknown-freebsd11.3.

> 

> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c: In function 'void dump_histogram_value(FILE*, histogram_value)':

> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c:268:28: error: format '%lld' expects argument of type 'long long int', but argument 3 hastype 'int' [-Werror=format=]

>   268 |        fprintf (dump_file, " all: %" PRId64 "%s, values: ",

>       |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>   269 |          abs ((int64_t) hist->hvalue.counters[0]),

>       |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>       |              |

>       |              int

> 

> (I'm not sure why my nightly tester has not caught this, but only

> the snapshot did.)


This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93962 .

Gerald
Gerald Pfeifer Feb. 27, 2020, 11:34 p.m. | #7
On Thu, 27 Feb 2020, Gerald Pfeifer wrote:
>> This (or rather its predecessor?) breaks bootstrap on 32-bit 

>> i386-unknown-freebsd11.3.

>> 

>> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c: In function 'void dump_histogram_value(FILE*, histogram_value)':

>> /scratch/tmp/gerald/gcc10-devel-work/gcc-10-20200223/gcc/value-prof.c:268:28: error: format '%lld' expects argument of type 'long long int', but argument 3 hastype 'int' [-Werror=format=]

>>   268 |        fprintf (dump_file, " all: %" PRId64 "%s, values: ",

>>       |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>>   269 |          abs ((int64_t) hist->hvalue.counters[0]),

>>       |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>>       |              |

>>       |              int

>> 

>> (I'm not sure why my nightly tester has not caught this, but only

>> the snapshot did.)

> This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93962 .


And Andrew had a very good hint there (thanks!).  

The patch below indeed restores the build on i386-unknown-freebsd11.

Okay?  Or does this qualify as obvious?

Gerald


2020-02-28  Gerald Pfeifer  <gerald@pfeifer.com>
	    Andrew Pinski  <apinski@marvell.com>

	PR bootstrap/93962
	* value-prof.c (dump_histogram_value): Use std::abs instead of
	abs.
 
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 8e9f129708a..585b909096f 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -266,7 +266,7 @@ dump_histogram_value (FILE *dump_file, histogram_value hist)
 	  if (hist->hvalue.counters)
 	    {
 	      fprintf (dump_file, " all: %" PRId64 "%s, values: ",
-		       abs ((int64_t) hist->hvalue.counters[0]),
+		       std::abs ((int64_t) hist->hvalue.counters[0]),
 		       hist->hvalue.counters[0] < 0
 		       ? " (values missing)": "");
 	      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
Martin Liška March 2, 2020, 9:30 a.m. | #8
On 2/28/20 12:34 AM, Gerald Pfeifer wrote:
> Okay?  Or does this qualify as obvious?


The patch seems to me obvious. Please install it.

Martin

Patch

diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index f0456c8e93d..e906bd49848 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -265,13 +265,17 @@  dump_histogram_value (FILE *dump_file, histogram_value hist)
 		    ? "Top N value counter" : "Indirect call counter"));
 	  if (hist->hvalue.counters)
 	    {
-	      fprintf (dump_file, " all: %" PRId64 ", values: ",
-		       (int64_t) hist->hvalue.counters[0]);
+	      fprintf (dump_file, " all: %" PRId64 "%s, values: ",
+		       abs ((int64_t) hist->hvalue.counters[0]),
+		       hist->hvalue.counters[0] < 0
+		       ? " (values missing)": "");
 	      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
 		{
-		  fprintf (dump_file, "[%" PRId64 ":%" PRId64 "]",
+		  fprintf (dump_file, "[%" PRId64 ":%" PRId64 "%s]",
 			   (int64_t) hist->hvalue.counters[2 * i + 1],
-			   (int64_t) hist->hvalue.counters[2 * i + 2]);
+			   (int64_t) abs (hist->hvalue.counters[2 * i + 2]),
+			   hist->hvalue.counters[2 * i + 2] < 0
+			   ? " (unreproducible)" : "");
 		  if (i != GCOV_TOPN_VALUES - 1)
 		    fprintf (dump_file, ", ");
 		}
@@ -330,7 +334,7 @@  stream_out_histogram_value (struct output_block *ob, histogram_value hist)
       /* When user uses an unsigned type with a big value, constant converted
 	 to gcov_type (a signed type) can be negative.  */
       gcov_type value = hist->hvalue.counters[i];
-      if (hist->type == HIST_TYPE_TOPN_VALUES && i > 0)
+      if (hist->type == HIST_TYPE_TOPN_VALUES)
 	;
       else
 	gcc_assert (value >= 0);
@@ -719,26 +723,47 @@  gimple_divmod_fixed_value (gassign *stmt, tree value, profile_probability prob,
 
 /* Return the n-th value count of TOPN_VALUE histogram.  If
    there's a value, return true and set VALUE and COUNT
-   arguments.  */
+   arguments.
+
+   Counters have the following meanings.
+
+   abs (counters[0]) is the number of executions
+   for i in 0 ... TOPN-1
+     counters[2 * i + 1] is target
+     abs (counters[2 * i + 2]) is corresponding hitrate counter.
+
+   Value of counters[0] negative when counter became
+   full during merging and some values are lost.
+
+   Value of counters[2 * i + 2] is negative if there was run when the
+   corresponding targt was not having probability 1/4.
+
+   If both counters[0] and counters[2 * i + 2] are negatie we do not know the
+   precise hitrate count of the target in case order of merges is not fixed
+   (as with parallel profiledbootstrap).
+  */
 
 bool
 get_nth_most_common_value (gimple *stmt, const char *counter_type,
 			   histogram_value hist, gcov_type *value,
 			   gcov_type *count, gcov_type *all, unsigned n)
 {
-  if (hist->hvalue.counters[2] == -1)
-    return false;
-
   gcc_assert (n < GCOV_TOPN_VALUES);
 
   *count = 0;
   *value = 0;
 
-  gcov_type read_all = hist->hvalue.counters[0];
+  gcov_type read_all = abs (hist->hvalue.counters[0]);
 
   gcov_type v = hist->hvalue.counters[2 * n + 1];
   gcov_type c = hist->hvalue.counters[2 * n + 2];
 
+  /* Negative value marks that target is not necessarily reproducible
+     for parallel merging.  */
+  if (c < 0 && hist->hvalue.counters[0] < 0)
+    return false;
+  c = abs (c);
+
   /* Indirect calls can't be verified.  */
   if (stmt
       && check_counter (stmt, counter_type, &c, &read_all,
diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index 19b8ee72ae9..2cf6ebf04ea 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -86,64 +86,110 @@  __gcov_merge_time_profile (gcov_type *counters, unsigned n_counters)
 
 #ifdef L_gcov_merge_topn
 
+/* To merging of TOPN profiles.
+
+   counters[0] is the number of executions
+   for i in 0 ... TOPN-1
+     counters[2 * i + 1] is target
+     counters[2 * i + 2] is corresponding hitrate counter.
+
+   Because we prune counters only those with probability >= 1/TOPN are
+   present now.
+
+   We use sign of counters[0] to track whether the number of different
+   targets exceeds TOPN and sign of counters[2 * i + 2] to track whether given
+   value was having probability at least 1/TOPN in each run.  */
 static void
 merge_topn_values_set (gcov_type *counters)
 {
   /* First value is number of total executions of the profiler.  */
-  gcov_type all = gcov_get_counter_ignore_scaling (-1);
-  counters[0] += all;
-  ++counters;
-
-  /* Read all part values.  */
-  gcov_type read_counters[2 * GCOV_TOPN_VALUES];
+  gcov_type all = gcov_get_counter ();
 
-  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
+  /* Early returns (needed for logic tracking sign bits below).
+     If there is nothing to merge in, return early.  */
+  if (all == 0)
     {
-      read_counters[2 * i] = gcov_get_counter_target ();
-      read_counters[2 * i + 1] = gcov_get_counter_ignore_scaling (-1);
+      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
+	{
+	  gcov_get_counter_target ();
+	  gcov_get_counter ();
+	}
+      return;
     }
 
-  if (read_counters[1] == -1)
+  /* Counter is not trained at all; copy data.  */
+  if (!counters[0])
     {
-      counters[1] = -1;
+      counters[0] = all;
+      ++counters;
+      for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
+	{
+	  counters[2 * i] = gcov_get_counter_target ();
+	  counters[2 * i + 1] = gcov_get_counter ();
+	}
       return;
     }
 
+  /* Negative value mans that counters is missing some of values.  */
+  if (all < 0)
+    counters[0] = -counters[0];
+  counters[0] += all;
+  ++counters;
+
+  char updated[GCOV_TOPN_VALUES];
+  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
+    updated[i] = 0;
+
   for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
     {
-      if (read_counters[2 * i + 1] == 0)
+      gcov_type read_value = gcov_get_counter_target ();
+      gcov_type read_cnt = gcov_get_counter ();
+
+      if (read_cnt == 0)
 	continue;
 
       unsigned j;
-      int slot = -1;
+      int slot = 0;
 
       for (j = 0; j < GCOV_TOPN_VALUES; j++)
 	{
-	  if (counters[2 * j] == read_counters[2 * i])
+	  if (counters[2 * j] == read_value)
 	    {
-	      counters[2 * j + 1] += read_counters[2 * i + 1];
+	      /* Negative value means that counter was not present in every
+		 run.  */
+	      if (read_cnt < 0)
+		counters[2 * j + 1] = -counters[2 * j + 1];
+	      counters[2 * j + 1] += read_cnt;
+	      updated[j] = 1;
 	      break;
 	    }
-	  else if (counters[2 * j + 1] == 0)
+	  /* Look for least probable counter.  At this moment already some
+	     of counters may be negative.  */
+	  if (abs (counters[2 * j + 1]) < abs (counters[2 * slot + 1]))
 	    slot = j;
 	}
 
       if (j == GCOV_TOPN_VALUES)
 	{
-	  if (slot > 0)
+	  /* Counter is full.  Throw away least frequent values but
+	     mark that some values gone missing.  */
+	  if (counters[2 * slot + 1] && counters[-1] > 0)
+	    counters[-1] = -counters[-1];
+	  if (abs (counters[2 * slot + 1]) < abs (read_cnt))
 	    {
-	      /* If we found empty slot, add the value.  */
-	      counters[2 * slot] = read_counters[2 * i];
-	      counters[2 * slot + 1] = read_counters[2 * i + 1];
-	    }
-	  else
-	    {
-	      /* We haven't found a slot, bail out.  */
-	      counters[1] = -1;
-	      return;
+	      counters[2 * slot] = read_value;
+	      counters[2 * slot + 1] = read_cnt;
+	      /* Mark that the value was not present in every run.  */
+	      if (counters[2 * slot + 1] > 0)
+		counters[2 * slot + 1] = -counters[2 * slot + 1];
 	    }
 	}
     }
+  /* Finally all values which was not present in read counters must be
+     marked as not present in all runs.  */
+  for (unsigned i = 0; i < GCOV_TOPN_VALUES; i++)
+    if (!updated[i] && counters[2 * i + 1] > 0)
+      counters[2 * i + 1] = -counters[2 * i + 1];
 }
 
 /* The profile merging function for choosing the most common value.