On Sat, Jul 17, 2010 at 18:01, Steven D'Aprano <st...@pearwood.info> wrote:
> Having generated the digits, it might be useful to look for deviations > from randomness. There should be approximately equal numbers of each > digit (100,000,000 each of 0, 1, 2, ..., 9), of each digraph > (10,000,000 each of 00, 01, 02, ..., 98, 99), trigraphs (1,000,000 each > of 000, ..., 999) and so forth. I've been doing a bit of that. I found approx. equal numbers of each digit (including the zeros :) ). Then I thought I'd look at pairs of the same digit ('00', '11, and so on). See my <http://tutoree7.pastebin.com/S9JzmmtY>. The results for the 1 billion file start at line 78, and look good to me. I might try trigraphs where the 2nd digit is 2 more than the first, and the third 2 more than the 2nd. E.g. '024', '135', '791', '802'. Or maybe I've had enough. BTW Steve, my script avoids the problem you mentioned, of counting 2 '55's in a '555' string. I get only one, but 2 in '5555'. See line 18, in the while loop. I was surprised that I could read in the whole billion file with one gulp without running out of memory. Memory usage went to 80% (from the usual 35%), but no higher except at first, when I saw 98% for a few seconds, and then a drop to 78-80% where it stayed. > The interesting question is, if you measure a deviation from the > equality (and you will), is it statistically significant? If so, it is > because of a problem with the random number generator, or with my > algorithm for generating the sample digits? I was pretty good at statistics long ago -- almost became a statistician -- but I've pretty much lost what I had. Still, I'd bet that the deviations I've seen so far are not significant. Thanks for the stimulating challenge, Steve. Dick _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor