Hi,
The product I'm working on requires new documents to be searchable very quickly (inside 60 seconds is my goal). The corpus is also going to grow very large, although it is perfectly partitionable by user.

The approach I tried first was to have write-only masters and read- only slaves with data being replicated from one to another postCommit and postOptimise.

This allowed new documents to be visible inside 5 minutes or so (until the indexes got so large that re-opening IndexSearchers took for ever, that is...), but still not good enough.

Now, I am considering cutting out the commit / replicate / re-open cycle by augmenting Solr with a RAMDirectory per core.

Your thoughts on the following approach would be much appreciated:

Searches would be forked to both the RAMDirectory and FSDirectory, while writes would go to the RAMDirectory only. The RAMDirectory would be flushed back to the FSDirectory regularly, using IndexWriter.addIndexes (or addIndexesNoOptimise).

Effectively, I'd be creating a searchable queue in front of a regularly committed and optimised conventional index.

As this seems to be a useful pattern (and is mentioned tangentially in Lucene in Action), is there already support for this in Lucene?

Thanks,
James

Reply via email to