On Fri, 26 Aug 2011, Robert G. Brown wrote: > Let's try a bit of "paper fiddling". The expected number of filled slots > is (this is actual code, not pseudocode, for n slots): > > nlogn = log10(n)*n; > expected = (n - n*pow(1.0-1.0/(1.0*n),nlogn)); > > The reasoning is enormously simple. The probability of a slot being > empty after one pull is (1 - 1/n). After nlogn pulls, it is p_e = (1 - > 1/n)^nlogn. The probability of a slot being filled is thus 1 - p_e, and > given n slots n - n*(1-1/n)^nlogn of them "should" be filled, within > random noise, n*(1-1/n)^nlogn of them "should" be empty.
Silly me. All of the anonymous slots are at least asymptotically independent (not necessarily obvious, but true from symmetry, I think, subject to the weak constraint that the total population of all of the slots has to add up to the number of trials so there are probably n-1 degrees of freedom in the Pearson test). We have p and q. The distribution is binomial and of course I know the binomial distribution and its sigma. I can easily build any one of several tests on top of this (simple binomial or even multinomial, since I effectively have the hit frequency for n slots and it should BE the binomial distribution), and in fact have two or three already that are very similar to this on a smaller scale. It's what comes of hacking out -- sorry, "fiddling" out -- quick solutions and tests late at night when you're tired and ought to be sleeping. A bit of coffee makes a world of difference...:-) I'll have to think a bit about it and make sure that this isn't already done, better, in e.g. STS, but it might yet see the light of day as an actual dieharder test. BTW, I'm not replying to your space alien ET post (to the Beowulf list in reply to an already OT discussion of martingales that arose out of a discussion of good RNGs and seeding strategies sorry y'all but hey, at least it is entertaining?) simply because my jaw is sore from hitting the ground so many times while reading it. Those are some top-quality hallucinogens, yes they are... We will now return to your regularly scheduled discussion of boring things like bandwidth, memory reliability, parallel algorithms and the like, you know, on-topic stuff. But if any of y'all ever need to test rngs or flame schemes to "win" non-zero-sum games by means of "strategy", you know who to call...;-) Somewhere upstairs I have this nifty book on game theory and in a pinch I can even trot out an actual game matrix and analyze outcomes algefiddlingbraically! rgb P.S. -- Vincent, all of these simple problems were solved by mathematicians and statisticians so very, very, long ago, beginning with the work of Pascal and Fermat (there are names to conjure with, eh?) solving the problem posed by the Chevalier de Mere regarding an even bet on double sixes happening at least once in 24 throws: actual probability of double sixes per throw are (of course) 1/36, probability of no double six in 24 throws are (35/36)^24, odds of at least one are therefore 1 - (35/36)^24 = 0.4914038761 -- all paper fiddling, mind you -- a result that is eerily reminiscent of the solution to your problem, but with fewer slots. So at even odds it is -- barely -- a sucker's bet. But a margin of 0.86% is enough to empty even the deepest pockets, over time. Now all you have to do is advance your actual knowledge of statistics beyond that realized by an idly rich French nobleman in 1654 (who still was wise enough to recognize that it wasn't an even bet and consulted the best of the best of the minds of his day to prove it). You have a mere 357 years to go...:-) P.P.S -- If "all rngs" were really as bad as you assert, does it not stand to reason that "all Monte Carlo computations" that use them would all get egregiously incorrect results? And yet they don't. In fact, in problems (like the Ising model in 2D) where known solutions exist, they agree basically perfectly with the theoretical solution, and of course it is easy to compare a wide range of integrals and Markov process outcomes with theory. So if you used your simple common sense you would construct a mental argument like: "Either I, in my brilliance, have discovered an egregious flaw in all random number generators used by all of those STUPID computer scientists, mathematicians, and physicists for decades to do their long and complex computations that no doubt all got equally egregiously wrong answers; Or Those computer scientists, mathematicians, and physicists are actually pretty smart and aggressively check their work (and each other's work) with a strong incentive to discover problems. It is rather probable that any such egregious error would have been long ago discovered; therefore there is almost certainly a serious error in my own reasoning." Seriously, dude. Ask yourself "Am I really smarter and better informed than Pascal, Fermat, Laplace, Bayes, not to mention all of those contemporary humans who have been devoting entire well-educated careers to random numbers as if all of modern e-commerce depended on them (it does) or is it just barely possible that I've made a mistake?" Come on, you can do it. I know it is difficult for you, but try.. Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:r...@phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf