Thanks Francesc! That does work much better:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 2.0 HDF5 version: 1.6.5 NumPy version: 1.0.4.dev3852 Zlib version: 1.2.3 BZIP2 version: 1.0.2 (30-Dec-2001) Python version: 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] Platform: darwin-Power Macintosh Byte-ordering: big -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test saving recarray using cPickle: 1.620880 sec/pass Test saving recarray with pytables: 2.074591 sec/pass Test saving recarray with pytables (with zlib): 14.320498 sec/pass Test loading recarray using cPickle: 1.023015 sec/pass Test loading recarray with pytables: 0.882411 sec/pass Test loading recarray with pytables (with zlib): 3.692698 sec/pass On 7/20/07 6:17 AM, "Francesc Altet" <[EMAIL PROTECTED]> wrote: > A Divendres 20 Juliol 2007 04:42, Vincent Nijs escrigué: >> I am interesting in using sqlite (or pytables) to store data for scientific >> research. I wrote the attached test program to save and load a simulated >> 11x500,000 recarray. Average save and load times are given below (timeit >> with 20 repetitions). The save time for sqlite is not really fair because I >> have to delete the data table each time before I create the new one. It is >> still pretty slow in comparison. Loading the recarray from sqlite is >> significantly slower than pytables or cPickle. I am hoping there may be >> more efficient ways to save and load recarray¹s from/to sqlite than what I >> am now doing. Note that I infer the variable names and types from the data >> rather than specifying them manually. >> >> I¹d luv to hear from people using sqlite, pytables, and cPickle about their >> experiences. >> >> saving recarray with cPickle: 1.448568 sec/pass >> saving recarray with pytable: 3.437228 sec/pass >> saving recarray with sqlite: 193.286204 sec/pass >> >> loading recarray using cPickle: 0.471365 sec/pass >> loading recarray with pytable: 0.692838 sec/pass >> loading recarray with sqlite: 15.977018 sec/pass > > For a more fair comparison, and for large amounts of data, you should inform > PyTables about the expected number of rows (see [1]) that you will end > feeding into the tables so that it can choose the best chunksize for I/O > purposes. > > I've redone the benchmarks (the new script is attached) with > this 'optimization' on and here are my numbers: > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > PyTables version: 2.0 > HDF5 version: 1.6.5 > NumPy version: 1.0.3 > Zlib version: 1.2.3 > LZO version: 2.01 (Jun 27 2005) > Python version: 2.5 (r25:51908, Nov 3 2006, 12:01:01) > [GCC 4.0.2 20050901 (prerelease) (SUSE Linux)] > Platform: linux2-x86_64 > Byte-ordering: little > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test saving recarray using cPickle: 0.197113 sec/pass > Test saving recarray with pytables: 0.234442 sec/pass > Test saving recarray with pytables (with zlib): 1.973649 sec/pass > Test saving recarray with pytables (with lzo): 0.925558 sec/pass > > Test loading recarray using cPickle: 0.151379 sec/pass > Test loading recarray with pytables: 0.165399 sec/pass > Test loading recarray with pytables (with zlib): 0.553251 sec/pass > Test loading recarray with pytables (with lzo): 0.264417 sec/pass > > As you can see, the differences between raw cPickle and PyTables are much less > than not informing about the total number of rows. In fact, an automatic > optimization can easily be done in PyTables so that when the user is passing > a recarray, the total length of the recarray would be compared with the > default number of expected rows (currently 10000), and if the former is > larger, then the length of the recarray should be chosen instead. > > I also have added the times when using compression just in case you are > interested using it. Here are the final file sizes: > > $ ls -sh data > total 132M > 24M data-lzo.h5 43M data-None.h5 43M data.pickle 25M data-zlib.h5 > > Of course, this is using completely random data, but with real data the > compression levels are expected to be higher than this. > > [1] http://www.pytables.org/docs/manual/ch05.html#expectedRowsOptim > > Cheers, -- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: [EMAIL PROTECTED] Skype: vincentnijs _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion