On 16/10/12 17:57, Abhishek Pratap wrote:
For my problem I need to store 400-800 million 20 characters keys in a
dictionary and do counting. This data structure takes about 60-100 Gb
of RAM.
Thats a lot of records but without details of what kind of counting you
plan on we can't give specific advice.
I am wondering if there are slick ways to map the dictionary to a file
on disk and not store it in memory but still access it as dictionary
object. Speed is not the main concern
The trivial solution is to use shelve since that makes a file look like
a dictionary. There are security issues but they don't sound like they'd
be a problem. I've no idea what performance of shelve would be like with
that many records though...
I did think about databases for this but intuitively it looks like a
overkill coz for each key you have to first check whether it is
already present and increase the count by 1 and if not then insert
the key into dbase.
The database does all of that automatically and fast.
You just need to set it up, load the data and use it - probably around
50 lines of SQL... And you don't need anything fancy for a single table
database - Access, SQLite, even FoxPro...
Or you could just create a big text file and process it line by line if
the data fits that model. Lots of options.
Personally I'd go with a database for speed, flexibility and ease of coding.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor