<aenea...@priest.com> wrote

I can get the code you wrote to work on my toy data, but my real
input data is actually contained in 10 files that are about 1.5 GB
each--when I try to run the code on one of those files, everything freezes.

Fot those kind of volumes I'd go for a SQL database every time!
(SQLlite might be OK but I'd be tempted to go to something even
beefier, like MySQL, PostGres or Firebird)

To solve this, I tried just having the data write to a different csv file:

For huge data volumes sequential files like csv are always going
to be slow. You need random access, and a full blown database
will probably be the best bet IMHO.

But my guess is that converting from one CSV to another isn't
going to be as efficient as creating a shelve database.

A shelve is fine for very simple lookups but its still basically
a flat file. And the minute you need to access by anything
other than the single key you are back to sequential processing.

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to