[Rd] RSQLite indexing

Thomas Lumley Mon, 22 Oct 2007 14:55:13 -0700

I am trying to use RSQLite for storing data and  I need to create indexes on 
two variables in the table. It appears from searching the web that the CREATE 
INDEX operation in SQLite is relatively slow for large files, and this has been 
my experience as well.


The two index variables are crossed. One has about 350,000 levels [yes, it's 
genetic association data]. The other will have about 4000 levels eventually, 
but is up to about 100 now.   When the data were entered they were already 
ordered by this second index variable.

Creating the index took about an hour on the 100-level, presorted variable and 
about 12 hours on the 350,000-level unsorted variable.  I'm looking for advice 
on how to reduce this. Specifically
1/ would it be faster if the variable with more levels was the presorted one?
2/ would it be faster or slower if the index were created before adding all the 
data?
3/ are there any options that can be set to speed up the indexing?

The SQLite database will not be the primary archive for the data, so 
optimizations that are risky in the case of power loss or hardware failure are 
still acceptable.  Since Bioconductor seems to use SQLite a lot I'm hoping 
there is some simple solution.


     -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]       University of Washington, Seattle

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] RSQLite indexing

Reply via email to