On Tue, Nov 16, 2021 at 10:42 PM Kevin Ushey <kevinus...@gmail.com> wrote: > > Do you see this same hang in a build of R with debug symbols? Can you > try running R with GDB, or even WinDbg or another debugger, to see > what the call stack looks like when the hang occurs? Does the hang > depend on the number of threads used by OpenBLAS? > > On the off chance it's relevant, I've seen hangs / crashes when using > a multithreaded OpenBLAS with R on some Linux systems before, but > never found the time to isolate a root cause. >
This last was a good thought, Kevin, as I had just compiled OpenBLAS 3.18 multi-threaded, but I recompiled it single threaded and it still crashes. The version of R I built from source last time, (2021-05-20 r80347), does not hang when calling `dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1)`. I think I built that with OpenBLAS 3.15. I can try doing that here. As for building with debug symbols, I have never done that before, so if you could provide some guidance (off-list if you think it is inappropriate to keep it here) or point me in the direction of some already posted advice, I would appreciate it! Avi > Best, > Kevin > > On Tue, Nov 16, 2021 at 5:12 AM Avraham Adler <avraham.ad...@gmail.com> wrote: > > > > On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler > > <maech...@stat.math.ethz.ch> wrote: > > > > > > >>>>> Avraham Adler > > > >>>>> on Tue, 16 Nov 2021 02:35:56 +0000 writes: > > > > > > > I am building r-devel on Windows 10 64bit using Jeroen's mingw > > > system, > > > > and I am finding that my make check-devel hangs on the above issue. > > > > Everything is vanila except that I am using OpenBLAS 0.3.18. I have > > > > been using OpenBLAS for over a decade and have not had this issue > > > > before. Is there anything I can do to dig deeper into this issue > > > from > > > > my end? Could there be anything that changed in R-devel that may > > > have > > > > triggered this? The bugzilla report doesn't have any code attached > > > to > > > > it. > > > > > > > Thank you, > > > > Avi > > > > > > Hmm.. it would've be nice to tell a bit more, instead of having all > > > your readers to search links, etc. > > > > > > In the bugzilla bug report PR#13309 > > > https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was > > > > > > dchisq(x=Inf, df=10, ncp=1) > > > > > > I had fixed the bug 13 years ago, in svn rev 47005 > > > with regression test in <Rsrc>/tests/d-p-q-r-tests.R : > > > > > > > > > ## Non-central Chi^2 density for large x > > > stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1)) > > > ## did hang in 2.8.0 and earlier (PR#13309). > > > > > > > > > and you are seeing your version of R hanging at exactly this > > > location? > > > > > > > > > I'd bet quite a bit that the underlying code in these > > > non-central chi square computations *never* calls BLAS and hence > > > I cannot imagine how openBLAS could play a role. > > > > > > However, there must be something peculiar in your compiler setup, > > > compilation options, .... > > > as of course the above regression test has been run 100s of > > > 1000s of times also under Windows in the last 13 years .. > > > > > > Last but not least (but really only vaguely related): > > > There is still a FIXME in the source code (but not about > > > hanging, but rather of loosing some accuracy in border cases), > > > see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c > > > and for that reason I had written an R version of that C code > > > even back in 2008 which I've made available in CRAN package > > > DPQ a few years ago (together with many other D/P/Q > > > distribution computations/approximations). > > > -> https://cran.r-project.org/package=DPQ > > > > > > Best, > > > Martin > > > > > > > Hello, Martin. > > > > Apologies, I thought the PR # was sufficient. Yes, I am seeing this at > > this exact location. This is what I saw in d-p-q-r-tst-2.Rout.fail and > > I then ran d-p-q-r-tst.R line-by-line and R hung precisely after > > `stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))`. > > > > Is it at all possible that this has to do with the recent change from > > bd0 to ebd0 (PR #15628) [1]? > > > > For completeness, I ran all the code _beneath_ the call, and while > > nothing else cause an infinite loop, I posted what I believe may be > > unexpected results below, > > > > Thank you, > > > > Avi > > > > [1]: https://bugs.r-project.org/show_bug.cgi?id=15628 > > > > > ## FIXME ?!: MxM/2 seems +- ok ?? > > > (dLmM <- dnbinom(xL, mu = 1, size = MxM)) # all NaN but the last > > Warning in dnbinom(xL, mu = 1, size = MxM) : NaNs produced > > [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN > > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 > > > (dLpI <- dnbinom(xL, prob=1/2, size = Inf))# ditto > > Warning in dnbinom(xL, prob = 1/2, size = Inf) : NaNs produced > > [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN > > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 > > > (dLpM <- dnbinom(xL, prob=1/2, size = MxM))# ditto > > Warning in dnbinom(xL, prob = 1/2, size = MxM) : NaNs produced > > [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN > > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 > > > > > d <- dnbinom(x, mu = mu, size = Inf) # gave NaN (for 0 and L), now all 0 > > Warning in dnbinom(x, mu = mu, size = Inf) : NaNs produced > > > p <- pnbinom(x, mu = mu, size = Inf) # gave all NaN, now uses ppois(x, > > > mu) > > Warning in pnbinom(x, mu = mu, size = Inf) : NaNs produced > > > > > pp <- (0:16)/16 > > > q <- qnbinom(pp, mu = mu, size = Inf) # gave all NaN > > > set.seed(1); NI <- rnbinom(32, mu = mu, size = Inf)# gave all NaN > > > set.seed(1); N2 <- rnbinom(32, mu = mu, size = L ) > > > stopifnot(exprs = { > > + all.equal(d, c(0.006737947, 0.033689735, 0.0842243375, > > 0.140373896, 0,0,0,0), tol = 9e-9)# 7.6e-10 > > + all.equal(p, c(0.006737947, 0.040427682, 0.1246520195, > > 0.265025915, 1,1,1,1), tol = 9e-9)# 7.3e-10 > > + all.equal(d, dpois(x, mu))# current implementation: even identical() > > + all.equal(p, ppois(x, mu)) > > + q == c(0, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 8, 9, Inf) > > + q == qpois(pp, mu) > > + identical(NI, N2) > > + }) > > Error: d and c(0.006737947, 0.033689735, 0.0842243375, 0.140373896, 0, > > 0, .... are not equal: > > 'is.NA' value mismatch: 0 in current 1 in target > > > > > if(!(onWindows && arch == "x86")) { > > + ## This gave a practically infinite loop (on 64-bit Lnx, Windows; > > not in 32-bit) > > + tools::assertWarning(p <- pchisq(1.00000012e200, df=1e200, ncp=100), > > + "simpleWarning", verbose=TRUE) > > + stopifnot(p == 1) > > + } > > Asserted warning: pnchisq(x=1e+200, f=1e+200, theta=100, ..): not > > converged in 1000000 iter. > > > > [This may be OK, AA] > > > > > ## Show the (mostly) small differences : > > > all.equal( qs, qpU, tol=0) > > [1] "Mean relative difference: 1.572997e-16" > > > all.equal(-qs, qp., tol=0) > > [1] "Mean relative difference: 1.572997e-16" > > > all.equal(-qp.,qpU, tol=0) # typically TRUE (<==> exact equality) > > [1] "Mean relative difference: 4.710277e-16" > > > stopifnot(exprs = { > > + all.equal( qs, qpU, tol=1e-15) > > + all.equal(-qs, qp., tol=1e-15) > > + all.equal(-qp., qpU, tol=1e-15)# diff of 4.71e-16 in 4.1.0 w/icc > > (Eric Weese) > > + }) > > > ## both failed very badly in R <= 4.0.x > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel