Message ID  996b2c2c45e7b434e5d94441d2273829@tiscali.it 

State  New 
Headers  show 
Series 

Related  show 
Hi, On 10/12/2017 14:47, Michele Pezzutti wrote: > Hi. > > This patch intends to fix Bug 83237  Values returned by > std::poisson_distribution are not distributed correctly. > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237for issue > description and tests. In any case, the fix should come with a testcase, which would also validate the analysis. For this discrete distribution should be pretty easy to add one, because all the infrastructure is already in place, essentially three lines added to 26_numerics/random/poisson_distribution/operators/values.cc. Paolo.
Hi, On 10/12/2017 14:47, Michele Pezzutti wrote: > Hi. > > This patch intends to fix Bug 83237  Values returned by > std::poisson_distribution are not distributed correctly. > See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237for issue > description and tests. In any case, the fix should come with a testcase, which would also validate the analysis. For this discrete distribution should be pretty easy to add one, because all the infrastructure is already in place, essentially three lines added to 26_numerics/random/poisson_distribution/operators/values.cc. Paolo.
I apologize as I am unable to run the test suitein its full form. Nevertheless, I locally tested the following patch for the test suite: *26_numerics/random/poisson_distribution/operators/values.cc Add additional test to cover bin 'floor(mu) + 1' by compiling locally values.cc and running it as main(). I had to increase N from default value N = 100000 to N = 4000000 to trigger the fault. I think that for smaller N, the deviation due to the bug is well within the confidence interval set in testDiscreteDist(). The test case below is successful with the proposed patch. diff git a/values.cc b/values.cc index 0039b7d..e12e54f 100644  a/values.cc +++ b/values.cc @@ 42,6 +42,10 @@ void test01() std::poisson_distribution<> pd3(30.0); auto bpd3 = std::bind(pd3, eng); testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } ); + + std::poisson_distribution<> pd4(37.17); + auto bpd4 = std::bind(pd4, eng); + testDiscreteDist<100, 4000000>(bpd4, [](int n) { return poisson_pdf(n, 37.17); } ); } On 12/11/2017 10:52 AM, Paolo Carlini wrote: > Hi, > > On 10/12/2017 14:47, Michele Pezzutti wrote: >> Hi. >> >> This patch intends to fix Bug 83237  Values returned by >> std::poisson_distribution are not distributed correctly. >> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237 for issue >> description and tests. > In any case, the fix should come with a testcase, which would also > validate the analysis. For this discrete distribution should be pretty > easy to add one, because all the infrastructure is already in place, > essentially three lines added to > 26_numerics/random/poisson_distribution/operators/values.cc. > > Paolo.
I lowered to N = 2500000 and still fails with a good margin. On 12/11/2017 09:58 PM, Michele Pezzutti wrote: > I apologize as I am unable to run the test suite in its full form. > > Nevertheless, I locally tested the following patch for the test suite: > > *26_numerics/random/poisson_distribution/operators/values.cc > Add additional test to cover bin 'floor(mu) + 1' > > by compiling locally values.cc and running it as main(). > > > I had to increase N from default value N = 100000 to N = 4000000 to > trigger the fault. > I think that for smaller N, the deviation due to the bug is well > within the confidence > interval set in testDiscreteDist(). > > > The test case below is successful with the proposed patch. > > > diff git a/values.cc b/values.cc > index 0039b7d..e12e54f 100644 >  a/values.cc > +++ b/values.cc > @@ 42,6 +42,10 @@ void test01() > std::poisson_distribution<> pd3(30.0); > auto bpd3 = std::bind(pd3, eng); > testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } ); > + > + std::poisson_distribution<> pd4(37.17); > + auto bpd4 = std::bind(pd4, eng); > + testDiscreteDist<100, 4000000>(bpd4, [](int n) { return > poisson_pdf(n, 37.17); } ); > } > > > > On 12/11/2017 10:52 AM, Paolo Carlini wrote: >> Hi, >> >> On 10/12/2017 14:47, Michele Pezzutti wrote: >>> Hi. >>> >>> This patch intends to fix Bug 83237  Values returned by >>> std::poisson_distribution are not distributed correctly. >>> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83237 for issue >>> description and tests. >> In any case, the fix should come with a testcase, which would also >> validate the analysis. For this discrete distribution should be >> pretty easy to add one, because all the infrastructure is already in >> place, essentially three lines added to >> 26_numerics/random/poisson_distribution/operators/values.cc. >> >> Paolo. >
Hi,
On 11/12/2017 23:16, Michele Pezzutti wrote:
> I lowered to N = 2500000 and still fails with a good margin.
Good. At the moment however, I think we need a bit of rationale for the
change that you are proposing, what would you put in a comment in the
code? It's been a while since the last time I looked into these
algorithms, is there a simple way to explain why the change is needed
within the basic rejection method proposed by Devroye? Devroye's book is
freely available, have you been able to study the relevant bits already?
(http://www.nrbook.com/devroye/). He is also very approachable in
private email, if I remember correctly.
Eventually, we could also agree on a good way to extend the coverage of
the testing, maybe for gcc8 simply add the testcase, but then, for gcc9
I think we could extend it quite a bit in a consistent way, something
like a grid from 1.0 to 50 step 1.0 with an increased N. Better if we
figure out something that looks generic but would also have caught
anyway 83237, if you see what I mean. I can take care of that. For the
other discrete distributions too of course.
Thanks a lot for your help!
Paolo.
Hi. Yes, I looked at the text before submitting the patch. I contacted Devroye and he confirmed that another reader had also pointed out this bug but not the solution. I sent him my proposed patch, he will look into it (no idea when though). I would state that "comparison function for x = 1 is e^(1/78)" (which becomes 1/78 as the algorithm uses logprobabilities). I think the change is needed because otherwise, for that particular bin, the rejection probability is lower than it should be, resulting in a higher number of samples. Regarding your second point, I understand what you mean and I would be glad to help. On 12/12/2017 01:51 AM, Paolo Carlini wrote: > Hi, > > On 11/12/2017 23:16, Michele Pezzutti wrote: >> I lowered to N = 2500000 and still fails with a good margin. > Good. At the moment however, I think we need a bit of rationale for > the change that you are proposing, what would you put in a comment in > the code? It's been a while since the last time I looked into these > algorithms, is there a simple way to explain why the change is needed > within the basic rejection method proposed by Devroye? Devroye's book > is freely available, have you been able to study the relevant bits > already? (http://www.nrbook.com/devroye/). He is also very > approachable in private email, if I remember correctly. > > Eventually, we could also agree on a good way to extend the coverage > of the testing, maybe for gcc8 simply add the testcase, but then, for > gcc9 I think we could extend it quite a bit in a consistent way, > something like a grid from 1.0 to 50 step 1.0 with an increased N. > Better if we figure out something that looks generic but would also > have caught anyway 83237, if you see what I mean. I can take care of > that. For the other discrete distributions too of course. > > Thanks a lot for your help! > > Paolo. >
Hi, On 12/12/2017 19:42, Michele Pezzutti wrote: > Hi. > > Yes, I looked at the text before submitting the patch. > > I contacted Devroye and he confirmed that another reader had also > pointed out this bug but not the solution. I sent him my proposed > patch, he will look into it (no idea when though). Nice. > I would state that "comparison function for x = 1 is e^(1/78)" (which > becomes 1/78 as the algorithm uses logprobabilities). > > I think the change is needed because otherwise, for that particular > bin, the rejection probability is lower than it should be, resulting > in a higher number of samples. Ok. Ideally I would be much less nervous about committing the patch if we either 1 Had Luc's explicit green light; 2 Were able to *rigorously deduce* within the framework of the book why the change is needed. That said, the patch makes sense to me and so far holds up well in all my tests (I'm currently running a full make check). I would say, let's wait a week or so and then make the final decision. Jon, do you agree? Ideas about further testing? (eg, some code you are aware of stressing Poisson?) Thanks again, Paolo.
On 12/12/17 21:37 +0100, Paolo Carlini wrote: >Hi, > >On 12/12/2017 19:42, Michele Pezzutti wrote: >>Hi. >> >>Yes, I looked at the text before submitting the patch. >> >>I contacted Devroye and he confirmed that another reader had also >>pointed out this bug but not the solution. I sent him my proposed >>patch, he will look into it (no idea when though). >Nice. >>I would state that "comparison function for x = 1 is e^(1/78)" >>(which becomes 1/78 as the algorithm uses logprobabilities). >> >>I think the change is needed because otherwise, for that particular >>bin, the rejection probability is lower than it should be, resulting >>in a higher number of samples. >Ok. Ideally I would be much less nervous about committing the patch if >we either 1 Had Luc's explicit green light; 2 Were able to >*rigorously deduce* within the framework of the book why the change is >needed. That said, the patch makes sense to me and so far holds up >well in all my tests (I'm currently running a full make check). I >would say, let's wait a week or so and then make the final decision. >Jon, do you agree? Ideas about further testing? (eg, some code you are >aware of stressing Poisson?) No, I have nothing useful to add here, but I CC'd Ed on the PR as I'd like his input.
If Luc's explicit green light will not arrive before it is decision time, Paolo's point 2 below is doable. Il 13.12.2017 12:51 Jonathan Wakely ha scritto: > On 12/12/17 21:37 +0100, Paolo Carlini wrote: > >> Hi, On 12/12/2017 19:42, Michele Pezzutti wrote: >> >>> Hi. Yes, I looked at the text before submitting the patch. I contacted Devroye and he confirmed that another reader had also pointed out this bug but not the solution. I sent him my proposed patch, he will look into it (no idea when though). >> Nice. >> >>> I would state that "comparison function for x = 1 is e^(1/78)" (which becomes 1/78 as the algorithm uses logprobabilities). I think the change is needed because otherwise, for that particular bin, the rejection probability is lower than it should be, resulting in a higher number of samples. >> Ok. Ideally I would be much less nervous about committing the patch if we either 1 Had Luc's explicit green light; 2 Were able to *rigorously deduce* within the framework of the book why the change is needed. That said, the patch makes sense to me and so far holds up well in all my tests (I'm currently running a full make check). I would say, let's wait a week or so and then make the final decision. Jon, do you agree? Ideas about further testing? (eg, some code you are aware of stressing Poisson?) > > No, I have nothing useful to add here, but I CC'd Ed on the PR as I'd > like his input. Con Mobile Open 7 GB a 9 euro/4 sett navighi veloce con 7 GB di Internet e hai 200 minuti ed SMS a 12 cent. Passa a Tiscali Mobile! http://tisca.li/OPEN7GBFirma
I got confirmation from Luc. He also added it to the errata filethe entries regarding p. 511, page 6 of http://luc.devroye.org/errors.pdf On 12/14/2017 11:11 AM, mpezz@tiscali.it wrote: > If Luc's explicit green light will not arrive before it is decision > time, Paolo's point 2 below is doable. > > Il 13.12.2017 12:51 Jonathan Wakely ha scritto: > >> On 12/12/17 21:37 +0100, Paolo Carlini wrote: >>> Hi, On 12/12/2017 19:42, Michele Pezzutti wrote: >>>> Hi. Yes, I looked at the text before submitting the patch. I >>>> contacted Devroye and he confirmed that another reader had also >>>> pointed out this bug but not the solution. I sent him my proposed >>>> patch, he will look into it (no idea when though). >>> Nice. >>>> I would state that "comparison function for x = 1 is e^(1/78)" >>>> (which becomes 1/78 as the algorithm uses logprobabilities). I >>>> think the change is needed because otherwise, for that particular >>>> bin, the rejection probability is lower than it should be, >>>> resulting in a higher number of samples. >>> Ok. Ideally I would be much less nervous about committing the patch >>> if we either 1 Had Luc's explicit green light; 2 Were able to >>> *rigorously deduce* within the framework of the book why the change >>> is needed. That said, the patch makes sense to me and so far holds >>> up well in all my tests (I'm currently running a full make check). I >>> would say, let's wait a week or so and then make the final decision. >>> Jon, do you agree? Ideas about further testing? (eg, some code you >>> are aware of stressing Poisson?) >> No, I have nothing useful to add here, but I CC'd Ed on the PR as I'd >> like his input. >> >> >
Hi, On 23/12/2017 23:10, Michele Pezzutti wrote: > I got confirmation from Luc. > He also added it to the errata filethe entries regarding p. 511, > page 6 of http://luc.devroye.org/errors.pdf Nice. Then I'm going to commit the below, tested x86_64linux. Thanks again! Cheers, Paolo. ////////////////// 20171224 Michele Pezzutti <mpezz@tiscali.it> PR libstdc++/83237 * include/bits/random.tcc (poisson_distribution<>::operator()): Fix __x = 1 case  see updated Errata of Devroye's treatise. * testsuite/26_numerics/random/poisson_distribution/operators/ values.cc: Add test. * testsuite/26_numerics/random/pr60037neg.cc: Adjust dgerror line number. Index: include/bits/random.tcc ===================================================================  include/bits/random.tcc (revision 255991) +++ include/bits/random.tcc (working copy) @@ 1301,6 +1301,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const double __c2 = __param._M_c2b + __c1; const double __c3 = __c2 + 1; const double __c4 = __c3 + 1; + // 1 / 78 + const double __178 = 0.0128205128205128205128205128205128L; // e^(1 / 78) const double __e178 = 1.0129030479320018583185514777512983L; const double __c5 = __c4 + __e178; @@ 1340,7 +1342,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else if (__u <= __c4) __x = 0; else if (__u <= __c5)  __x = 1; + { + __x = 1; + // Only in the Errata, see libstdc++/83237. + __w = __178; + } else { const double __v = std::log(1.0  __aurng()); Index: testsuite/26_numerics/random/poisson_distribution/operators/values.cc ===================================================================  testsuite/26_numerics/random/poisson_distribution/operators/values.cc (revision 255991) +++ testsuite/26_numerics/random/poisson_distribution/operators/values.cc (working copy) @@ 42,6 +42,12 @@ void test01() std::poisson_distribution<> pd3(30.0); auto bpd3 = std::bind(pd3, eng); testDiscreteDist(bpd3, [](int n) { return poisson_pdf(n, 30.0); } ); + + // libstdc++/83237 + std::poisson_distribution<> pd4(37.17); + auto bpd4 = std::bind(pd4, eng); + testDiscreteDist<100, 2000000>(bpd4, [](int n) + { return poisson_pdf(n, 37.17); } ); } int main() Index: testsuite/26_numerics/random/pr60037neg.cc ===================================================================  testsuite/26_numerics/random/pr60037neg.cc (revision 255991) +++ testsuite/26_numerics/random/pr60037neg.cc (working copy) @@ 11,4 +11,4 @@ auto x = std::generate_canonical<std::size_t, // { dgerror "static assertion failed: template argument must be a floating point type" "" { target *** } 156 } // { dgerror "static assertion failed: template argument must be a floating point type" "" { target *** } 3311 } +// { dgerror "static assertion failed: template argument must be a floating point type" "" { target *** } 3317 }
diff git a/random.tcc b/random.tcc index 95bcf0a..f36587e 100644  a/random.tcc +++ b/random.tcc @@ 1301,6 +1301,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const double __c2 = __param._M_c2b + __c1; const double __c3 = __c2 + 1; const double __c4 = __c3 + 1; + // 1 / 78 + const double __178 = 0.0128205128205128205128205128205128L; // e^(1 / 78) const double __e178 = 1.0129030479320018583185514777512983L; const double __c5 = __c4 + __e178; @@ 1340,7 +1342,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else if (__u <= __c4) __x = 0; else if (__u <= __c5) + { __x = 1; + __w = __178; + } else { const double __v = std::log(1.0  __aurng());