i have a large amount of data (120 GB) to be indexed in the index. Hence i want to improve the performance of indexing this data. I went through the documentation given on the lucene website which mentioned various ways by which the performance can be improved.
i am working on debian linux with amd64. hence the file size supported is very large. java version is 1.6 i tried many points mentioned in that documentations but got unusual results. 1) Reuse field & document objects to reduce the GC overhead using the field.setValue() method.. By doing this, instead of speeding up, the indexing speed reduced drastically. i know this is unusual but thats what happened. 2) Tuning parameters by setMergeFactor(), setMaxBufferedDocs(). now the default value for both is 10.. i increased the value to 1000.. by doing so the no of .CSF file in the index folder increased many folds.. and i got java.io.IOException : Too Many Files Open. IF i choose the default value 10 for both the parameters then this error is avoided but then size of .fdt file in index becomes really high. so where am i going wrong ?? how to overcome these problems..how to speed up my indexing process..