Steven D'Aprano wrote: > On Sun, 19 Oct 2008 04:38:04 +0000, Tim Roberts wrote: > >> Steven D'Aprano <[EMAIL PROTECTED]> wrote: >>> >>>On Fri, 17 Oct 2008 20:51:37 +1300, Lawrence D'Oliveiro wrote: >>> >>>> Is piece really meant to be random? If so, your create_random_block >>>> function isn't achieving much--xoring random data together isn't going >>>> to produce anything more exciting than less random data than you >>>> started with. >>> >>>Hmmm... why do you say that xoring random data with other random data >>>produces less randomness than you started with? >>> >>>I'm not saying that you're wrong, and certainly it is pointless since >>>you're not going to improve on the randomness of /dev/urandom without a >>>lot of work. But less random? >> >> For those who got a bit lost here, I'd would point out that Knuth[1] has >> an excellent chapter on random numbers that includes a detailed >> discussion of this effect. His net takeaway is that most of the things >> people do to increase randomness actually have exactly the opposite >> effect. > > I don't doubt it at all. But xoring random data with more random data? > I'm guessing that if the two sources of data are independent and from the > same distribution, then xoring them is pointless but not harmful. Here's > a rough-and-ready test which suggests there's little harm in it: > > >>>> import os, math >>>> def rand_data(size): > ... return [ord(c) for c in os.urandom(size)] > ... >>>> def mean(data): > ... return sum(data)/len(data) > ... >>>> def stdev(data): > ... return math.sqrt( mean([x**2 for x in data]) - mean(data)**2 ) > ... >>>> A = rand_data(1000) # good random data >>>> B = rand_data(1000) # more good random data >>>> AB = [a^b for (a,b) in zip(A, B)] # is this still good random data? >>>> assert len(AB) == len(A) == len(B) >>>> >>>> mean(A), stdev(A) > (126, 73.91887445030531) >>>> mean(B), stdev(B) > (128, 74.242844773082339) >>>> mean(AB), stdev(AB) > (129, 74.39085965358916) > > > Note: I wouldn't take the above terribly seriously. Mean and standard > deviation alone are terrible measures of the randomness of data. But this > does suggest that any deviation from uniform randomness will be quite > subtle. > > >
Operations like 'and' and 'or' will tend to destroy randomness. 'and' tends to the 0-string and 'or' tends to the 1-string. I feel like 'xor' should be safe (like Steven), but is the proof merely the half-and-half split of the truth table? -- http://mail.python.org/mailman/listinfo/python-list
