On Oct 16, 2012, at 12:57 PM, Abhishek Pratap <abhishek....@gmail.com> wrote:

> Hi Guys
> 
> For my problem I need to store 400-800 million 20 characters keys in a
> dictionary and do counting. This data structure takes about 60-100 Gb
> of RAM.
> I am wondering if there are slick ways to map the dictionary to a file
> on disk and not store it in memory but still access it as dictionary
> object. Speed is not the main concern in this problem and persistence
> is not needed as the counting will only be done once on the data. We
> want the script to run on smaller memory machines if possible.
> 
> I did think about databases for this but intuitively it looks like a
> overkill coz for each key you have to first check whether it is
> already present and increase the count by 1  and if not then insert
> the key into dbase.
> 
> Just want to take your opinion on this.
> 
> Thanks!
> -Abhi
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

With apologies for coming to this late.  Just a couple of comments -

1)  If the keys were sorted before the counting operation was to take place, 
the problem could be run in successive batches in as small a foot print as 
desired.


2)  Sorting in limited memory space was a highly developed art back in the 
early days of IBM mainframe computing (when one had to use tape rather than 
disk for temp store and 64k, yes "k", was a lot of memory).  If you look at 
Knuth, Volume III: Sorting and Searching you will find several algorithms that 
would allow the sort to also run in as small a foot print as needed.

Possibly not interesting, but just saying…

Bill

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to