>>>>> Avraham Adler on Thu, 18 Nov 2021 02:18:54 +0000 writes:
> Hello. I have isolated the issue: it is the > fused-multiply-add instruction set (FMA on Intel > processors). Running -march=skylake -mno-fma not only does > not hang, but passes make check-all (using R's native > BLAS). My intuition remains that something in the new > more precise ebd0 code used in dpois_raw—called by dgamma, > called by dchsq, called by dnchisq—is hanging when the > assembler uses FMA. Unfortunately, I have come across > other cases online where the extra precision and the > different assembler code of FMA vs. non-FMA has caused > bugs, such as [1]. Page 5 of this paper by Dr. William > Kahan sheds some light on why this may be happening [2] > (PDF). > Martin & Morton, having written (PR#15628 [3]) and/or > implemented the ebd0 code that is now being used, can > either of you think of any reason why it would hang if > compiled using FMA? I vaguely remember I had a version of ebd0(), either Morton Welinder's original, or a slight modification of it that needed some mending, because in some border case, there was an out of array-boundary indexing... but that's just a vague recollection. I had investigated ebd0()'s behavior quite a bit, also notably the version -- together with a pure R code version -- in my CRAN package DPQ, yesterday updated to version 0.5-0 on CRAN {written in Summer, but published to CRAN only yesterday} where I have dpois_raw() optionally using several experimental versions of bd0(), and both 'pure R' and a C version of ebd0(), as DPQ::ebd0() and DPQ::edb0C() each with an option 'verbose' which shows you which branches are chosen for the given arguments. So, if you install this version (0.5-0 or newer) from the development sources, using the *same* FMA configuration, I hope you should see the same "hanging" but would be able to see some more.. ? Can you install it from R-forge install.packages("DPQ", type = "source", repos="http://R-Forge.R-project.org") and then experiment? I'd be grateful {and we maybe can move "off - mailing list"} Thank you in advance, Martin Martin Maechler ETH Zurich and R Core team > Again, I'm not a professional, but > line 325 of the ebd0 function in bd0.c [4] has "ADD1(-x * > log1pmx ((M * fg - x) / x))" which looks like a > Multiply-Add to me, at least in the inner parenthesis. Is > there anything that can be, or should be, done with the > code to prevent the hang, or must we forbid the use of FMA > instructions (and I guess FMA4 on AMD processors) when > compiling R? > Also, What happens in the case where M/x neither over- nor > under-flowed, M_LN2 * ((double) -e) <= 1. + DBL_MAX / x, > fg != 1, and after 4 loops of lines 329 & 330, *yh is > still finite? How does ebd0 exit in that case? There is no > "return" after line 331. Am I missing something? Could > that be related to this issue? > As an aside, could ebd0 be enhanced by using FMA > instructions on processors which support them? > Thank you very much, > Avi > [1] > https://flameeyes.blog/2014/10/27/the-subtlety-of-modern-cpus-or-the-search-for-the-phantom-bug/ > [2] > https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF > [3] https://bugs.r-project.org/show_bug.cgi?id=15628 [4] > https://github.com/wch/r-source/blob/trunk/src/nmath/bd0.c > On Wed, Nov 17, 2021 at 3:55 PM Avraham Adler > <avraham.ad...@gmail.com> wrote: >> >> Hello, Martin et. al. >> >> I apologize for top posting, but I believe I have tracked >> down the difference why last time my build worked and now >> it hangs on `dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, >> ncp=1)`. and it's NOT the BLAS. I built against both 3.15 >> AND R's vanilla and it hung both times. The issue was >> passing "march=skylake". I own an i7-8700K which gcc >> considers a skylake. When I pass mtune=skylake, it does >> not hang and the make check-devel after the build >> completes. >> >> Below is a list of the different flags passed when using >> mtune vs. march. It stands to reason that at least one >> of them contributed to the hanging issue which Martin >> fixed in >> https://bugs.r-project.org/show_bug.cgi?id=13309. While I >> recognize the obvious ones, I'm not an expert and do not >> understand which if any may be the culprit. For >> reference, most of these flags are described here: >> https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/x86-Options.html#x86-Options. >> >> All the following flags are DISABLED for mtune=skylake >> (so march=x86-64) and ENABLED when passing >> march-skylake. For the record, I usually passed march in >> the past without problem: >> >> madx, maes, mavx, mavx2, mbmi, mbmi2, mclflushopt, mcx16, >> mf16c, mfma, mfsgsbase, mhle, mlzcnt, mmovbe, mpclmul, >> mpopcnt, mprfchw, mrdrnd, mrdseed, msahf, msgx, msse3, >> msse4, msse4.1, msse4.2, mssse3, mxsave, mxsavec, >> mxsaveopt, and mxsaves. >> >> Inversely, mno-sse4 is enabled when using mtune and >> disabled when using arch, of course. >> >> For completeness, the following two are disabled on both >> mtune and march but enabled when passing march=native, >> otherwise the latter is the same as march=skylake: mabm >> and mrtm. Obviously these cannot contribute to the >> hanging issue. >> >> Any ideas, especially from the experts who understand how >> the flags would address the code in dchisq, would be >> greatly appreciated. >> >> Thank you, >> >> Avi >> >> On Wed, Nov 17, 2021 at 7:15 AM Avraham Adler >> <avraham.ad...@gmail.com> wrote: >> > >> > On Tue, Nov 16, 2021 at 10:42 PM Kevin Ushey >> <kevinus...@gmail.com> wrote: >> > > >> > > Do you see this same hang in a build of R with debug >> symbols? Can you > > try running R with GDB, or even >> WinDbg or another debugger, to see > > what the call >> stack looks like when the hang occurs? Does the hang > > >> depend on the number of threads used by OpenBLAS? >> > > >> > > On the off chance it's relevant, I've seen hangs / >> crashes when using > > a multithreaded OpenBLAS with R on >> some Linux systems before, but > > never found the time >> to isolate a root cause. >> > > >> > >> > >> > This last was a good thought, Kevin, as I had just >> compiled OpenBLAS > 3.18 multi-threaded, but I recompiled >> it single threaded and it still > crashes. The version of >> R I built from source last time, (2021-05-20 > r80347), >> does not hang when calling `dchisq(c(Inf, 1e80, 1e50, >> 1e40), > df=10, ncp=1)`. I think I built that with >> OpenBLAS 3.15. I can try > doing that here. As for >> building with debug symbols, I have never done > that >> before, so if you could provide some guidance (off-list >> if you > think it is inappropriate to keep it here) or >> point me in the > direction of some already posted >> advice, I would appreciate it! >> > >> > Avi >> > >> > >> > >> > > Best, > > Kevin >> > > >> > > On Tue, Nov 16, 2021 at 5:12 AM Avraham Adler >> <avraham.ad...@gmail.com> wrote: >> > > > >> > > > On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler > > >> > <maech...@stat.math.ethz.ch> wrote: >> > > > > >> > > > > >>>>> Avraham Adler > > > > >>>>> on Tue, 16 Nov >> 2021 02:35:56 +0000 writes: >> > > > > >> > > > > > I am building r-devel on Windows 10 64bit using >> Jeroen's mingw system, > > > > > and I am finding that my >> make check-devel hangs on the above issue. > > > > > >> Everything is vanila except that I am using OpenBLAS >> 0.3.18. I have > > > > > been using OpenBLAS for over a >> decade and have not had this issue > > > > > before. Is >> there anything I can do to dig deeper into this issue >> from > > > > > my end? Could there be anything that >> changed in R-devel that may have > > > > > triggered >> this? The bugzilla report doesn't have any code attached >> to > > > > > it. >> > > > > >> > > > > > Thank you, > > > > > Avi >> > > > > >> > > > > Hmm.. it would've be nice to tell a bit more, >> instead of having all > > > > your readers to search >> links, etc. >> > > > > >> > > > > In the bugzilla bug report PR#13309 > > > > >> https://bugs.r-project.org/show_bug.cgi?id=13309 ,the >> example was >> > > > > >> > > > > dchisq(x=Inf, df=10, ncp=1) >> > > > > >> > > > > I had fixed the bug 13 years ago, in svn rev >> 47005 > > > > with regression test in >> <Rsrc>/tests/d-p-q-r-tests.R : >> > > > > >> > > > > >> > > > > ## Non-central Chi^2 density for large x > > > > >> stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, >> ncp=1)) > > > > ## did hang in 2.8.0 and earlier >> (PR#13309). >> > > > > >> > > > > >> > > > > and you are seeing your version of R hanging at >> exactly this > > > > location? >> > > > > >> > > > > >> > > > > I'd bet quite a bit that the underlying code in >> these > > > > non-central chi square computations *never* >> calls BLAS and hence > > > > I cannot imagine how >> openBLAS could play a role. >> > > > > >> > > > > However, there must be something peculiar in your >> compiler setup, > > > > compilation options, .... > > > >> > as of course the above regression test has been run >> 100s of > > > > 1000s of times also under Windows in the >> last 13 years .. >> > > > > >> > > > > Last but not least (but really only vaguely >> related): > > > > There is still a FIXME in the source >> code (but not about > > > > hanging, but rather of >> loosing some accuracy in border cases), > > > > see >> e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c >> > > > > and for that reason I had written an R version of >> that C code > > > > even back in 2008 which I've made >> available in CRAN package > > > > DPQ a few years ago >> (together with many other D/P/Q > > > > distribution >> computations/approximations). > > > > -> >> https://cran.r-project.org/package=DPQ >> > > > > >> > > > > Best, > > > > Martin >> > > > > >> > > > >> > > > Hello, Martin. >> > > > >> > > > Apologies, I thought the PR # was sufficient. Yes, >> I am seeing this at > > > this exact location. This is >> what I saw in d-p-q-r-tst-2.Rout.fail and > > > I then >> ran d-p-q-r-tst.R line-by-line and R hung precisely after >> > > > `stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), >> df=10, ncp=1))`. >> > > > >> > > > Is it at all possible that this has to do with the >> recent change from > > > bd0 to ebd0 (PR #15628) [1]? >> > > > >> > > > For completeness, I ran all the code _beneath_ the >> call, and while > > > nothing else cause an infinite >> loop, I posted what I believe may be > > > unexpected >> results below, >> > > > >> > > > Thank you, >> > > > >> > > > Avi >> > > > >> > > > [1]: >> https://bugs.r-project.org/show_bug.cgi?id=15628 >> > > > >> > > > > ## FIXME ?!: MxM/2 seems +- ok ?? > > > > (dLmM >> <- dnbinom(xL, mu = 1, size = MxM)) # all NaN but the >> last > > > Warning in dnbinom(xL, mu = 1, size = MxM) : >> NaNs produced > > > [1] NaN NaN NaN NaN NaN NaN NaN NaN >> NaN NaN NaN NaN NaN NaN NaN NaN > > > NaN NaN NaN NaN NaN >> NaN NaN NaN NaN NaN NaN 0 > > > > (dLpI <- dnbinom(xL, >> prob=1/2, size = Inf))# ditto > > > Warning in >> dnbinom(xL, prob = 1/2, size = Inf) : NaNs produced > > > >> [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN >> NaN NaN NaN > > > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN >> NaN 0 > > > > (dLpM <- dnbinom(xL, prob=1/2, size = >> MxM))# ditto > > > Warning in dnbinom(xL, prob = 1/2, >> size = MxM) : NaNs produced > > > [1] NaN NaN NaN NaN NaN >> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN > > > NaN NaN >> NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 >> > > > >> > > > > d <- dnbinom(x, mu = mu, size = Inf) # gave NaN >> (for 0 and L), now all 0 > > > Warning in dnbinom(x, mu = >> mu, size = Inf) : NaNs produced > > > > p <- pnbinom(x, >> mu = mu, size = Inf) # gave all NaN, now uses ppois(x, >> mu) > > > Warning in pnbinom(x, mu = mu, size = Inf) : >> NaNs produced >> > > > >> > > > > pp <- (0:16)/16 > > > > q <- qnbinom(pp, mu = mu, >> size = Inf) # gave all NaN > > > > set.seed(1); NI <- >> rnbinom(32, mu = mu, size = Inf)# gave all NaN > > > > >> set.seed(1); N2 <- rnbinom(32, mu = mu, size = L ) > > > >> > stopifnot(exprs = { > > > + all.equal(d, c(0.006737947, >> 0.033689735, 0.0842243375, > > > 0.140373896, 0,0,0,0), >> tol = 9e-9)# 7.6e-10 > > > + all.equal(p, c(0.006737947, >> 0.040427682, 0.1246520195, > > > 0.265025915, 1,1,1,1), >> tol = 9e-9)# 7.3e-10 > > > + all.equal(d, dpois(x, mu))# >> current implementation: even identical() > > > + >> all.equal(p, ppois(x, mu)) > > > + q == c(0, 2, 3, 3, 3, >> 4, 4, 4, 5, 5, 6, 6, 6, 7, 8, 9, Inf) > > > + q == >> qpois(pp, mu) > > > + identical(NI, N2) > > > + }) > > > >> Error: d and c(0.006737947, 0.033689735, 0.0842243375, >> 0.140373896, 0, > > > 0, .... are not equal: > > > >> 'is.NA' value mismatch: 0 in current 1 in target >> > > > >> > > > > if(!(onWindows && arch == "x86")) { > > > + ## >> This gave a practically infinite loop (on 64-bit Lnx, >> Windows; > > > not in 32-bit) > > > + >> tools::assertWarning(p <- pchisq(1.00000012e200, >> df=1e200, ncp=100), > > > + "simpleWarning", >> verbose=TRUE) > > > + stopifnot(p == 1) > > > + } > > > >> Asserted warning: pnchisq(x=1e+200, f=1e+200, theta=100, >> ..): not > > > converged in 1000000 iter. >> > > > >> > > > [This may be OK, AA] >> > > > >> > > > > ## Show the (mostly) small differences : > > > > >> all.equal( qs, qpU, tol=0) > > > [1] "Mean relative >> difference: 1.572997e-16" > > > > all.equal(-qs, qp., >> tol=0) > > > [1] "Mean relative difference: 1.572997e-16" >> > > > > all.equal(-qp.,qpU, tol=0) # typically TRUE (<==> >> exact equality) > > > [1] "Mean relative difference: >> 4.710277e-16" > > > > stopifnot(exprs = { > > > + >> all.equal( qs, qpU, tol=1e-15) > > > + all.equal(-qs, >> qp., tol=1e-15) > > > + all.equal(-qp., qpU, tol=1e-15)# >> diff of 4.71e-16 in 4.1.0 w/icc > > > (Eric Weese) > > > >> + }) > > > > ## both failed very badly in R <= 4.0.x >> > > > >> > > > ______________________________________________ > > >> > R-devel@r-project.org mailing list > > > >> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel