On Fri, 26 Aug 2011, Vincent Diepeveen wrote: > If you have a domain of 0..3 where a generator generates and your modulo n is > just n-1, obviously that means it'll map a tad more to 0. > > Basically the deviation one would be able to measure in such case is that if > we have a generator that runs over a field of say size m and we want to map > that onto n entries then we have the next formula : > > m = x * n + y; > > Now your theory is basically if i summarize it that in such case the entries > 0..y-1 will have a tad higher hit than y.. m-1.
What's a "tad" when you're measuring the quality of an RNG? I'm just curious. Could you be more specific? Just what are the limits, specifically, if your random number is a 30 bit uint that makes numbers in the range of 0-1 billion (nearly all generators make uints, with a few exceptions that usually make less than 32 bits -- 64 bit generators are still a rarity although of course two 32 rands makes one 64 bit rand) and you mod with m, especially a nice large m that doesn't integer divide r like 1.5 million? That means that each integer in the range 0 to 1.5 million gets 666 repetitions as the entire range of r gets sampled, except the first million that get 667. That means that the odds of pulling a number from 1 to a million are 1e6*667/1.e9 = .667. The odds of pulling a number from the second million are 0.5*666/1.e9 = .333 = 1 - .667. An old lady with terrible eyes could detect such an imbalance in probability from across the street -- you wouldn't even need a "random number generator tester". Weren't you advocating using this for nice large m like a million? I think that you were. No, wait! You were advocating something like one BILLION, right? Wrong direction to make it better, dude, this makes it worse. Note that this scales pretty well. For m in the range of thousands, the imbalance will be something like 0.666667 and 0.333330 -- still pretty easy to detect with any halfway decent RNG tester. Basically, you don't get (immeasurably) close to a uniform distribution in the weighting of any integer until you get down to (unsurprisingly) m of order unity compared to a uint, which at which point it basically becomes as accurate as m*(r/r_max) was in the first place. Note also that you've created an imbalance in the weighting of the integers you are sampling that is far, far greater and more serious than any other failure of randomness that your RNG might have. So much so that you couldn't possibly see any other -- it would be a signal swamped in the noise of your error, even for m merely in the thousands -- one part per million errors in randomness are easy to detect in simulations that draw 10^9 or so random numbers (which is still a SMALL TEST simulation -- real simulations draw 10^16 or 10^18 and your error would put answers on another PLANET compared to where they should be. Most coders probably can actually work this out with a pencil, and so I repeat no, nobody competent uses a modulus to generate integers in a fixed range in circumstances where the quality of the result matters, e.g. numerical simulation or cryptography as opposed to gaming, unless the modulus is a power of two. > However if x is large enough that shouldn't be a big problem. > > If we map now in the test i'm doing onto say a few million to a billion > entries, > the size of that x is a number of 40+ bits for most RNG's. x=32, a uint for most RNGs. Or to put it another way, RNGs generate a bit stream, which they might do with an algorithm that generates 30,31,32, or more bits at a time, but the prevalence of 32 bit architectures and the fact that it is trivial to concatenate 32s to get 64+ bits when desired has slowed the development of true 64 bit RNGs. Eventually there will be some, of course, and it will STILL be a mistake to use a modulus to create random integers in some general range. A bad algorithm is a bad algorithm, and this makes sense only if speed is more important than randomness (in which case one has to wonder, why use a 64 bit RNG in the first place, why use a good RNG in the first place). > So that means that the deviation of the effect you show above the order of > magnitued of 1 / 2^40 in such case, which is rather small. Except that it isn't, as I showed in a fair bit of detail. It might be if x were as large as you claim, which it isn't (in general or even "commonly") and if one confined m to be order unity. For m of order 2^20 (a million) the error for 2^40 is order 2^20 (a millionth) which shows up even in single precision floating point. Why bother testing such a stream for randomness? It fails. You've made it fail. It fails spectacularly if the generator is perfect, if the goddess Ifni herself produces the string of digits. It cannot succeed. > Especially because the 'test' if you want to call it like that, is operating > in the > granularity O ( log n ), we can fully ignore then the expected deviation > granularity O ( 2 ^ 40 ). Well, except that basically 100% of the rngs in the GSL pass your "test" when it is written correctly. They also produce precisely the correct/expected result (within easily understandable and expected random scatter) on top of that if they are "good" rngs. So the "test" isn't much of an actual test, and your assertion that "all rngs fail it" is false and based on a methodology that introduces many orders of magnitude of error greater than the generators are known to have as upper bounds. Given this fact, which I have personally verified, do you imagine that there might be other errors in your actual (not your pseudo) code? You gotta wonder. If you've tested a Mersenne Twister with your "test" and it fails to pass, either an MT is crap and all of the theoretical papers and experienced testers who have tested with sophisticated and peer-reviewed tools are stupid poo-poo heads, or, well, could it be that your test or implementation of the MT is crap and the MT itself in general is what everyone else seems to think that it is based on extensive "paper fiddling" and enormous piles of empirical testing evidence written by actual statisticians and rng experts. Which is to say, a damn good pseudo-RNG decorrelated in some 600+ dimensions that passes nearly all known tests with flying colors. Hmmm, let's put on our Bayesian thinking caps, consider the priors, and try very hard to guess which one is much much more likely on Jaynes' "decibel" scale of probabilities. Would you say that it is 20 decibels more likely that the MT is good and the test is broken? 50? 200? I like 2000 or thereabouts myself, or as we in the business might say, "it is a fact" that your test is broken since 10^200 is a really big number, comparatively speaking. Now, it would be nice if you apologized to "all RNGs" and "all programmers" and the various other groups you indicted on your little fallacious rant, but I'll consider myself enormously fortunate at this point if you simply acknowledge that maybe, just maybe, your original pronouncement -- that all rngs produce an egregiously, trivially verifiable excessive degree of first order sequential uniformity, is categorically and undeniably false. Of course, if you think I'm lying just to make you look bad, I can post a modified version of dieharder with your test embedded so absolutely anybody can see for themselves that all of the embedded generators pass your test and that not one single thing you asserted in grandiosely producing it was correct. The code is quite short and anybody can understand it. Or you can take my moderately expert word for it -- the results I posted are honest results produced using real RNGs from a real numerical library in the real test written by block copying your pseudocode, converting/realizing it in C, and fixing your obvious error in the generation of random ints in the range 0-m by using a tested algorithm written by people who actually know what they are doing that is IN the aforementioned real numerical library. Seriously, it is done. Finished. You're wrong. Say "I'm sorry, Mr. Mersenne Twister, if my test passes randu then how could it possibly fail you?" And don't forget to apologize to AES, RSA, DES, and all of the other encryption schema too. They all feel real bad that you called them stupid poo-poo heads unable to pass the simplest first order frequency test one can imagine, since they all had to pass MUCH more rigorous and often government mandated testing to ever get adopted as the basis for encryption. I don't expect an apology to me for being indicted along with ALL the OTHER programmers in the world for being stupid enough to use mod to make a supposedly uniformly distributed range of m rands. Not even Numerical Recipes was that boneheaded. But its OK, we all know that we didn't really ever do that, and if you did (and continue to do, apparently, learning nothing from my patient and thorough exposition of how it produces errors that are vastly greater than the ones that you think you are detecting) that's a problem to who? That's right, mister. To you. You'll just keep getting wroooooong answers, and then announcing them as fact and making yourself look silly. Or even sillier, if that is possible. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:r...@phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf