Strategy for presenting fresh data

James Brady Wed, 11 Jun 2008 20:25:13 -0700

Hi,

The product I'm working on requires new documents to be searchablevery quickly (inside 60 seconds is my goal). The corpus is also goingto grow very large, although it is perfectly partitionable by user.

The approach I tried first was to have write-only masters and read-only slaves with data being replicated from one to another postCommitand postOptimise.

This allowed new documents to be visible inside 5 minutes or so (untilthe indexes got so large that re-opening IndexSearchers took for ever,that is...), but still not good enough.

Now, I am considering cutting out the commit / replicate / re-opencycle by augmenting Solr with a RAMDirectory per core.


Your thoughts on the following approach would be much appreciated:

Searches would be forked to both the RAMDirectory and FSDirectory,while writes would go to the RAMDirectory only. The RAMDirectory wouldbe flushed back to the FSDirectory regularly, usingIndexWriter.addIndexes (or addIndexesNoOptimise).

Effectively, I'd be creating a searchable queue in front of aregularly committed and optimised conventional index.

As this seems to be a useful pattern (and is mentioned tangentially inLucene in Action), is there already support for this in Lucene?


Thanks,
James

Strategy for presenting fresh data

Reply via email to