On Sun, 18 Jul 2010 08:30:05 pm Richard D. Moores wrote: > > Taking the string '555', you should get two digraphs: 55_ and _55. > > That seems wrong to me. When I search on '999999' and there's a > '9999999' I don't want to think I've found 2 instances of '999999'. > But that's just my preference. Instances should be distinct, IMO, > and not overlap.
I think we're talking about different things here. You're (apparently) interested in searching for patterns, in which case looking for non-overlapping patterns is perfectly fine. I'm talking about testing the randomness of the generator by counting the frequency of digraphs and trigraphs, in which case you absolutely do want them to overlap. Otherwise, you're throwing away every second digraph, or two out of every three trigraphs, which could potentially hide a lot of non-randomness. > >> I was surprised that I could read in the whole billion file with > >> one gulp without running out of memory. > > > > Why? One billion bytes is less than a GB. It's a lot, but not > > *that* much. > > I earlier reported that my laptop couldn't handle even 800 million. What do you mean, "couldn't handle"? Couldn't handle 800 million of what? Obviously not bytes, because your laptop *can* handle well over 800 million bytes. It has 4GB of memory, after all :) There's a big difference in memory usage between (say): data = "1"*10**9 # a single string of one billion characters and data = ["1"]*10**9 # a list of one billion separate strings or even number = 10**(1000000000)-1 # a one billion digit longint This is just an example, of course. As they say, the devil is in the details. > >> Memory usage went to 80% (from > >> the usual 35%), but no higher except at first, when I saw 98% for > >> a few seconds, and then a drop to 78-80% where it stayed. > > > > That suggests to me that your PC probably has 2GB of RAM. Am I > > close? > > No. 4GB. Interesting. Presumably the rest of the memory is being used by the operating system and other running applications and background processes. -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor