Indexing large amount of data

sarfaraz masood Mon, 12 Jul 2010 11:14:36 -0700

i have a large amount of data (120 GB) to be indexed in the index. Hence i want 
to improve the performance of indexing this data. I went through the 
documentation given on the lucene website which mentioned various ways by which 
the performance can be improved.


i am working on debian linux with amd64. hence the file size supported is very 
large. java version is 1.6

i tried many points mentioned in that documentations but got unusual results.

1) Reuse field & document objects to reduce the GC overhead using the 
field.setValue() method.. By doing this, instead of speeding up, the indexing 
speed reduced drastically. i know this is unusual but thats what happened.

2) Tuning parameters by  setMergeFactor(), setMaxBufferedDocs(). 
now the default value for both is 10.. i increased the value to 1000.. by doing 
so the no of .CSF file in the index folder increased many folds.. and i got 
java.io.IOException : Too Many Files Open. 
    IF i choose the default value 10 for both the parameters then this error is 
avoided but then size of .fdt file in index becomes really high.

so where am i going wrong ?? how to overcome these problems..how to speed up my 
indexing process..

Indexing large amount of data

Reply via email to