Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly

Message ID 996b2c2c-45e7-b434-e5d9-4441d2273829@tiscali.it
State New
Headers show
Series
  • Fix Bug 83237 - Values returned by std::poisson_distribution are not distributed correctly
Related show

Commit Message

Michele Pezzutti Dec. 10, 2017, 1:47 p.m.
Hi.

This patch intends to fix Bug 83237 - Values returned by 
std::poisson_distribution are not distributed correctly.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237for issue 
description and tests.


     * include/bits/random.tcc (poisson_distribution<_IntType>::operator()):
Value of the comparison function shall be set to 1/78 for __x = 1

Comments

Paolo Carlini Dec. 11, 2017, 9:52 a.m. | #1
Hi,

On 10/12/2017 14:47, Michele Pezzutti wrote:
> Hi.

>

> This patch intends to fix Bug 83237 - Values returned by 

> std::poisson_distribution are not distributed correctly.

> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237for issue 

> description and tests.

In any case, the fix should come with a testcase, which would also 
validate the analysis. For this discrete distribution should be pretty 
easy to add one, because all the infrastructure is already in place, 
essentially three lines added to 
26_numerics/random/poisson_distribution/operators/values.cc.

Paolo.
Paolo Carlini Dec. 11, 2017, 9:52 a.m. | #2
Hi,

On 10/12/2017 14:47, Michele Pezzutti wrote:
> Hi.

>

> This patch intends to fix Bug 83237 - Values returned by 

> std::poisson_distribution are not distributed correctly.

> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237for issue 

> description and tests.

In any case, the fix should come with a testcase, which would also 
validate the analysis. For this discrete distribution should be pretty 
easy to add one, because all the infrastructure is already in place, 
essentially three lines added to 
26_numerics/random/poisson_distribution/operators/values.cc.

Paolo.
Michele Pezzutti Dec. 11, 2017, 8:58 p.m. | #3
I apologize as I am unable to run the test suitein its full form.

Nevertheless, I locally tested the following patch for the test suite:

*26_numerics/random/poisson_distribution/operators/values.cc
     Add additional test to cover bin 'floor(mu) + 1'

by compiling locally values.cc and running it as main().


I had to increase N from default value N = 100000 to N = 4000000 to 
trigger the fault.
I think that for smaller N, the deviation due to the bug is well within 
the confidence
interval set in testDiscreteDist().


The test case below is successful with the proposed patch.


diff --git a/values.cc b/values.cc
index 0039b7d..e12e54f 100644
--- a/values.cc
+++ b/values.cc
@@ -42,6 +42,10 @@ void test01()
    std::poisson_distribution<> pd3(30.0);
    auto bpd3 = std::bind(pd3, eng);
    testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } );
+
+  std::poisson_distribution<> pd4(37.17);
+  auto bpd4 = std::bind(pd4, eng);
+  testDiscreteDist<100, 4000000>(bpd4, [](int n) { return 
poisson_pdf(n, 37.17); } );
  }



On 12/11/2017 10:52 AM, Paolo Carlini wrote:
> Hi,

>

> On 10/12/2017 14:47, Michele Pezzutti wrote:

>> Hi.

>>

>> This patch intends to fix Bug 83237 - Values returned by 

>> std::poisson_distribution are not distributed correctly.

>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237 for issue 

>> description and tests.

> In any case, the fix should come with a testcase, which would also 

> validate the analysis. For this discrete distribution should be pretty 

> easy to add one, because all the infrastructure is already in place, 

> essentially three lines added to 

> 26_numerics/random/poisson_distribution/operators/values.cc.

>

> Paolo.
Michele Pezzutti Dec. 11, 2017, 10:16 p.m. | #4
I lowered to N = 2500000 and still fails with a good margin.


On 12/11/2017 09:58 PM, Michele Pezzutti wrote:
> I apologize as I am unable to run the test suite in its full form.

>

> Nevertheless, I locally tested the following patch for the test suite:

>

> *26_numerics/random/poisson_distribution/operators/values.cc

>     Add additional test to cover bin 'floor(mu) + 1'

>

> by compiling locally values.cc and running it as main().

>

>

> I had to increase N from default value N = 100000 to N = 4000000 to 

> trigger the fault.

> I think that for smaller N, the deviation due to the bug is well 

> within the confidence

> interval set in testDiscreteDist().

>

>

> The test case below is successful with the proposed patch.

>

>

> diff --git a/values.cc b/values.cc

> index 0039b7d..e12e54f 100644

> --- a/values.cc

> +++ b/values.cc

> @@ -42,6 +42,10 @@ void test01()

>    std::poisson_distribution<> pd3(30.0);

>    auto bpd3 = std::bind(pd3, eng);

>    testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } );

> +

> +  std::poisson_distribution<> pd4(37.17);

> +  auto bpd4 = std::bind(pd4, eng);

> +  testDiscreteDist<100, 4000000>(bpd4, [](int n) { return 

> poisson_pdf(n, 37.17); } );

>  }

>

>

>

> On 12/11/2017 10:52 AM, Paolo Carlini wrote:

>> Hi,

>>

>> On 10/12/2017 14:47, Michele Pezzutti wrote:

>>> Hi.

>>>

>>> This patch intends to fix Bug 83237 - Values returned by 

>>> std::poisson_distribution are not distributed correctly.

>>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237 for issue 

>>> description and tests.

>> In any case, the fix should come with a testcase, which would also 

>> validate the analysis. For this discrete distribution should be 

>> pretty easy to add one, because all the infrastructure is already in 

>> place, essentially three lines added to 

>> 26_numerics/random/poisson_distribution/operators/values.cc.

>>

>> Paolo.

>
Paolo Carlini Dec. 12, 2017, 12:51 a.m. | #5
Hi,

On 11/12/2017 23:16, Michele Pezzutti wrote:
> I lowered to N = 2500000 and still fails with a good margin.

Good. At the moment however, I think we need a bit of rationale for the 
change that you are proposing, what would you put in a comment in the 
code? It's been a while since the last time I looked into these 
algorithms, is there a simple way to explain why the change is needed 
within the basic rejection method proposed by Devroye? Devroye's book is 
freely available, have you been able to study the relevant bits already? 
(http://www.nrbook.com/devroye/). He is also very approachable in 
private email, if I remember correctly.

Eventually, we could also agree on a good way to extend the coverage of 
the testing, maybe for gcc8 simply add the testcase, but then, for gcc9 
I think we could extend it quite a bit in a consistent way, something 
like a grid from 1.0 to 50 step 1.0 with an increased N. Better if we 
figure out something that looks generic but would also have caught 
anyway 83237, if you see what I mean. I can take care of that. For the 
other discrete distributions too of course.

Thanks a lot for your help!

Paolo.
Michele Pezzutti Dec. 12, 2017, 6:42 p.m. | #6
Hi.

Yes, I looked at the text before submitting the patch.

I contacted Devroye and he confirmed that another reader had also 
pointed out this bug but not the solution. I sent him my proposed patch, 
he will look into it (no idea when though).

I would state that "comparison function for x = 1 is e^(1/78)" (which 
becomes 1/78 as the algorithm uses log-probabilities).

I think the change is needed because otherwise, for that particular bin, 
the rejection probability is lower than it should be, resulting in a 
higher number of samples.

Regarding your second point, I understand what you mean and I would be 
glad to help.


On 12/12/2017 01:51 AM, Paolo Carlini wrote:
> Hi,

>

> On 11/12/2017 23:16, Michele Pezzutti wrote:

>> I lowered to N = 2500000 and still fails with a good margin.

> Good. At the moment however, I think we need a bit of rationale for 

> the change that you are proposing, what would you put in a comment in 

> the code? It's been a while since the last time I looked into these 

> algorithms, is there a simple way to explain why the change is needed 

> within the basic rejection method proposed by Devroye? Devroye's book 

> is freely available, have you been able to study the relevant bits 

> already? (http://www.nrbook.com/devroye/). He is also very 

> approachable in private email, if I remember correctly.

>

> Eventually, we could also agree on a good way to extend the coverage 

> of the testing, maybe for gcc8 simply add the testcase, but then, for 

> gcc9 I think we could extend it quite a bit in a consistent way, 

> something like a grid from 1.0 to 50 step 1.0 with an increased N. 

> Better if we figure out something that looks generic but would also 

> have caught anyway 83237, if you see what I mean. I can take care of 

> that. For the other discrete distributions too of course.

>

> Thanks a lot for your help!

>

> Paolo.

>
Paolo Carlini Dec. 12, 2017, 8:37 p.m. | #7
Hi,

On 12/12/2017 19:42, Michele Pezzutti wrote:
> Hi.

>

> Yes, I looked at the text before submitting the patch.

>

> I contacted Devroye and he confirmed that another reader had also 

> pointed out this bug but not the solution. I sent him my proposed 

> patch, he will look into it (no idea when though).

Nice.
> I would state that "comparison function for x = 1 is e^(1/78)" (which 

> becomes 1/78 as the algorithm uses log-probabilities).

>

> I think the change is needed because otherwise, for that particular 

> bin, the rejection probability is lower than it should be, resulting 

> in a higher number of samples.

Ok. Ideally I would be much less nervous about committing the patch if 
we either 1- Had Luc's explicit green light; 2- Were able to *rigorously 
deduce* within the framework of the book why the change is needed. That 
said, the patch makes sense to me and so far holds up well in all my 
tests (I'm currently running a full make check). I would say, let's wait 
a week or so and then make the final decision. Jon, do you agree? Ideas 
about further testing? (eg, some code you are aware of stressing Poisson?)

Thanks again,
Paolo.
Jonathan Wakely Dec. 13, 2017, 11:51 a.m. | #8
On 12/12/17 21:37 +0100, Paolo Carlini wrote:
>Hi,

>

>On 12/12/2017 19:42, Michele Pezzutti wrote:

>>Hi.

>>

>>Yes, I looked at the text before submitting the patch.

>>

>>I contacted Devroye and he confirmed that another reader had also 

>>pointed out this bug but not the solution. I sent him my proposed 

>>patch, he will look into it (no idea when though).

>Nice.

>>I would state that "comparison function for x = 1 is e^(1/78)" 

>>(which becomes 1/78 as the algorithm uses log-probabilities).

>>

>>I think the change is needed because otherwise, for that particular 

>>bin, the rejection probability is lower than it should be, resulting 

>>in a higher number of samples.

>Ok. Ideally I would be much less nervous about committing the patch if 

>we either 1- Had Luc's explicit green light; 2- Were able to 

>*rigorously deduce* within the framework of the book why the change is 

>needed. That said, the patch makes sense to me and so far holds up 

>well in all my tests (I'm currently running a full make check). I 

>would say, let's wait a week or so and then make the final decision. 

>Jon, do you agree? Ideas about further testing? (eg, some code you are 

>aware of stressing Poisson?)


No, I have nothing useful to add here, but I CC'd Ed on the PR as I'd
like his input.
Michele Pezzutti Dec. 14, 2017, 10:11 a.m. | #9
If Luc's explicit green light will not arrive before it is decision
time, Paolo's point 2- below is doable.

Il 13.12.2017 12:51 Jonathan
Wakely ha scritto: 

> On 12/12/17 21:37 +0100, Paolo Carlini wrote:

>


>> Hi, On 12/12/2017 19:42, Michele Pezzutti wrote: 

>> 

>>> Hi. Yes, I

looked at the text before submitting the patch. I contacted Devroye and
he confirmed that another reader had also pointed out this bug but not
the solution. I sent him my proposed patch, he will look into it (no
idea when though).
>> Nice. 

>> 

>>> I would state that "comparison

function for x = 1 is e^(1/78)" (which becomes 1/78 as the algorithm
uses log-probabilities). I think the change is needed because otherwise,
for that particular bin, the rejection probability is lower than it
should be, resulting in a higher number of samples.
>> Ok. Ideally I

would be much less nervous about committing the patch if we either 1-
Had Luc's explicit green light; 2- Were able to *rigorously deduce*
within the framework of the book why the change is needed. That said,
the patch makes sense to me and so far holds up well in all my tests
(I'm currently running a full make check). I would say, let's wait a
week or so and then make the final decision. Jon, do you agree? Ideas
about further testing? (eg, some code you are aware of stressing
Poisson?)
> 

> No, I have nothing useful to add here, but I CC'd Ed on

the PR as I'd
> like his input.

  


Con Mobile Open 7 GB a 9 euro/4 sett navighi veloce con 7 GB di Internet e hai 200 minuti ed SMS a 12 cent. Passa a Tiscali Mobile! http://tisca.li/OPEN7GBFirma
Michele Pezzutti Dec. 23, 2017, 10:10 p.m. | #10
I got confirmation from Luc.
He also added it to the errata file---the entries regarding p. 511, page 
6 of http://luc.devroye.org/errors.pdf

On 12/14/2017 11:11 AM, mpezz@tiscali.it wrote:
> If Luc's explicit green light will not arrive before it is decision 

> time, Paolo's point 2- below is doable.

>

> Il 13.12.2017 12:51 Jonathan Wakely ha scritto:

>

>> On 12/12/17 21:37 +0100, Paolo Carlini wrote:

>>> Hi, On 12/12/2017 19:42, Michele Pezzutti wrote:

>>>> Hi. Yes, I looked at the text before submitting the patch. I 

>>>> contacted Devroye and he confirmed that another reader had also 

>>>> pointed out this bug but not the solution. I sent him my proposed 

>>>> patch, he will look into it (no idea when though).

>>> Nice.

>>>> I would state that "comparison function for x = 1 is e^(1/78)" 

>>>> (which becomes 1/78 as the algorithm uses log-probabilities). I 

>>>> think the change is needed because otherwise, for that particular 

>>>> bin, the rejection probability is lower than it should be, 

>>>> resulting in a higher number of samples.

>>> Ok. Ideally I would be much less nervous about committing the patch 

>>> if we either 1- Had Luc's explicit green light; 2- Were able to 

>>> *rigorously deduce* within the framework of the book why the change 

>>> is needed. That said, the patch makes sense to me and so far holds 

>>> up well in all my tests (I'm currently running a full make check). I 

>>> would say, let's wait a week or so and then make the final decision. 

>>> Jon, do you agree? Ideas about further testing? (eg, some code you 

>>> are aware of stressing Poisson?)

>> No, I have nothing useful to add here, but I CC'd Ed on the PR as I'd

>> like his input.

>>

>>

>
Paolo Carlini Dec. 24, 2017, 3:54 p.m. | #11
Hi,

On 23/12/2017 23:10, Michele Pezzutti wrote:
> I got confirmation from Luc.

> He also added it to the errata file---the entries regarding p. 511, 

> page 6 of http://luc.devroye.org/errors.pdf

Nice. Then I'm going to commit the below, tested x86_64-linux. Thanks again!

Cheers,
Paolo.

//////////////////
2017-12-24  Michele Pezzutti <mpezz@tiscali.it>

	PR libstdc++/83237
	* include/bits/random.tcc (poisson_distribution<>::operator()):
	Fix __x = 1 case - see updated Errata of Devroye's treatise.
	* testsuite/26_numerics/random/poisson_distribution/operators/
	values.cc: Add test.
	* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error
	line number.
Index: include/bits/random.tcc
===================================================================
--- include/bits/random.tcc	(revision 255991)
+++ include/bits/random.tcc	(working copy)
@@ -1301,6 +1301,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	    const double __c2 = __param._M_c2b + __c1;
 	    const double __c3 = __c2 + 1;
 	    const double __c4 = __c3 + 1;
+	    // 1 / 78
+	    const double __178 = 0.0128205128205128205128205128205128L;
 	    // e^(1 / 78)
 	    const double __e178 = 1.0129030479320018583185514777512983L;
 	    const double __c5 = __c4 + __e178;
@@ -1340,7 +1342,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		else if (__u <= __c4)
 		  __x = 0;
 		else if (__u <= __c5)
-		  __x = 1;
+		  {
+		    __x = 1;
+		    // Only in the Errata, see libstdc++/83237.
+		    __w = __178;
+		  }
 		else
 		  {
 		    const double __v = -std::log(1.0 - __aurng());
Index: testsuite/26_numerics/random/poisson_distribution/operators/values.cc
===================================================================
--- testsuite/26_numerics/random/poisson_distribution/operators/values.cc	(revision 255991)
+++ testsuite/26_numerics/random/poisson_distribution/operators/values.cc	(working copy)
@@ -42,6 +42,12 @@ void test01()
   std::poisson_distribution<> pd3(30.0);
   auto bpd3 = std::bind(pd3, eng);
   testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } );
+
+  // libstdc++/83237
+  std::poisson_distribution<> pd4(37.17);
+  auto bpd4 = std::bind(pd4, eng);
+  testDiscreteDist<100, 2000000>(bpd4, [](int n)
+				 { return poisson_pdf(n, 37.17); } );
 }
 
 int main()
Index: testsuite/26_numerics/random/pr60037-neg.cc
===================================================================
--- testsuite/26_numerics/random/pr60037-neg.cc	(revision 255991)
+++ testsuite/26_numerics/random/pr60037-neg.cc	(working copy)
@@ -11,4 +11,4 @@ auto x = std::generate_canonical<std::size_t,
 
 // { dg-error "static assertion failed: template argument must be a floating point type" "" { target *-*-* } 156 }
 
-// { dg-error "static assertion failed: template argument must be a floating point type" "" { target *-*-* } 3311 }
+// { dg-error "static assertion failed: template argument must be a floating point type" "" { target *-*-* } 3317 }

Patch

diff --git a/random.tcc b/random.tcc
index 95bcf0a..f36587e 100644
--- a/random.tcc
+++ b/random.tcc
@@ -1301,6 +1301,8 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
          const double __c2 = __param._M_c2b + __c1;
          const double __c3 = __c2 + 1;
          const double __c4 = __c3 + 1;
+        // 1 / 78
+        const double __178 = 0.0128205128205128205128205128205128L;
          // e^(1 / 78)
          const double __e178 = 1.0129030479320018583185514777512983L;
          const double __c5 = __c4 + __e178;
@@ -1340,7 +1342,10 @@  _GLIBCXX_BEGIN_NAMESPACE_VERSION
          else if (__u <= __c4)
            __x = 0;
          else if (__u <= __c5)
+          {
            __x = 1;
+          __w = __178;
+          }
          else
            {
              const double __v = -std::log(1.0 - __aurng());