Vincent, A Divendres 20 Juliol 2007 15:35, Vincent Nijs escrigué: > Still curious however ... does no one on this list use (and like) sqlite?
First of all, while I'm not a heavy user of relational databases, I've used them as references for benchmarking purposes. Hence, based on my own benchmarking experience, I'd say that, for writing, relational databases do take a lot of safety measures to ensure that all the data that is written to the disk is safe and that the data relationships don't get broken, and that takes times (a lot of time, in fact). I'm not sure about whether some of these safety measures can be relaxed, but even though some relational databases would allow this, my feel (beware, I can be wrong) is that you won't be able to reach cPickle/PyTables speed (cPickle/PyTables are not observing security measures in that regard because they are not thought for these tasks). In this sense, the best writing speed that I was able to achieve with Postgres (I don't know whether sqlite support this) is by simulating that your data comes from a file stream and using the "cursor.copy_from()" method. Using this approach I was able to accelerate a 10x (if I remember well) the injecting speed, but even with this, PyTables can be another 10x faster. You can see an exemple of usage in the Postgres backend [1] used for doing the benchmarks for comparing PyTables and Postgres speeds. Regarding reading speed, my diggins [2] seems to indicate that the bottleneck here is not related with safety, but with the need of the relational databases pythonic APIs of wrapping *every* element retrieved out of the database with a Python container (int, float, string...). On the contrary, PyTables does take advantage of creating an empty recarray as the container to keep all the retrieved data, and that's very fast compared with the former approach. To somewhat quantify this effect in function of the size of the dataset retrieved, you can see the figure 14 of [3] (as you can see, the larger the dataset retrieved, the larger the difference in terms of speed). Incidentally, and as it is said there, I'm hoping that NumPy containers should eventually be discovered by relational database wrappers makers, so these wrapping times would be removed completely, but I'm currently not aware of any package taking this approach. [1] http://www.pytables.org/trac/browser/trunk/bench/postgres_backend.py [2] http://thread.gmane.org/gmane.comp.python.numeric.general/9704 [3] http://www.carabos.com/docs/OPSI-indexes.pdf Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion