On Sun, Jul 18, 2010 at 02:26, Steven D'Aprano <st...@pearwood.info> wrote: > On Sun, 18 Jul 2010 06:49:39 pm Richard D. Moores wrote: > >> I might try >> trigraphs where the 2nd digit is 2 more than the first, and the third >> 2 more than the 2nd. E.g. '024', '135', '791', '802'. > > Why the restriction? There's only 1000 different trigraphs (10*10*10), > which is nothing.
Just to see if I could do it. It seemed interesting. >> Or maybe I've >> had enough. BTW Steve, my script avoids the problem you mentioned, of >> counting 2 '55's in a '555' string. I get only one, but 2 in '5555'. > > Huh? What problem did I mention? Sorry, that was Luke. > Taking the string '555', you should get two digraphs: 55_ and _55. That seems wrong to me. When I search on '999999' and there's a '9999999' I don't want to think I've found 2 instances of '999999'. But that's just my preference. Instances should be distinct, IMO, and not overlap. > In '5555' you should get three: 55__, _55_, __55. I'd do something like > this (untested): > > trigraphs = {} > f = open('digits') > trigraph = f.read(3) # read the first three digits > trigraphs[trigraph] = 1 > while 1: > c = f.read(1) > if not c: > break > trigraph = trigraph[1:] + c > if trigraph in trigraphs: > trigraphs[trigraph] += 1 > else: > trigraphs[trigraph] = 1 >> See line 18, in the while loop. >> >> I was surprised that I could read in the whole billion file with one >> gulp without running out of memory. > > Why? One billion bytes is less than a GB. It's a lot, but not *that* > much. I earlier reported that my laptop couldn't handle even 800 million. >> Memory usage went to 80% (from >> the usual 35%), but no higher except at first, when I saw 98% for a >> few seconds, and then a drop to 78-80% where it stayed. > > That suggests to me that your PC probably has 2GB of RAM. Am I close? No. 4GB. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor