On Oct 16, 2012, at 12:57 PM, Abhishek Pratap <abhishek....@gmail.com> wrote:
> Hi Guys > > For my problem I need to store 400-800 million 20 characters keys in a > dictionary and do counting. This data structure takes about 60-100 Gb > of RAM. > I am wondering if there are slick ways to map the dictionary to a file > on disk and not store it in memory but still access it as dictionary > object. Speed is not the main concern in this problem and persistence > is not needed as the counting will only be done once on the data. We > want the script to run on smaller memory machines if possible. > > I did think about databases for this but intuitively it looks like a > overkill coz for each key you have to first check whether it is > already present and increase the count by 1 and if not then insert > the key into dbase. > > Just want to take your opinion on this. > > Thanks! > -Abhi > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor With apologies for coming to this late. Just a couple of comments - 1) If the keys were sorted before the counting operation was to take place, the problem could be run in successive batches in as small a foot print as desired. 2) Sorting in limited memory space was a highly developed art back in the early days of IBM mainframe computing (when one had to use tape rather than disk for temp store and 64k, yes "k", was a lot of memory). If you look at Knuth, Volume III: Sorting and Searching you will find several algorithms that would allow the sort to also run in as small a foot print as needed. Possibly not interesting, but just saying… Bill _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor