Solr4x: Separate Indexer and Query Instances for Performance

Mike Schultz Tue, 05 Mar 2013 15:45:16 -0800

Solr 3x had a master/slave architecture which meant that indexing did not
happen in the same process as querying, in fact normally not even on the
same machine.  The querier only needed to copy down snapshots of the new
index files and commit them.  Great isolation for maximum query performance
and indexing performance.  Now in Solr4x this is gone.  Does anyone have any
answer or tuning approaches to address this?

We have a high query load, high indexing load environment. I see TP99 query
latency go from under 100mS to 4-10 seconds during indexing. Even TP90 hits
2 seconds. Looking at GC in visualVM, I see the a pretty sawtooth turn into
a scraggily forest when indexing happens and the eden space gets burned
through.

It seems like one approach is to have the shard leaders replicate (a la 3x)
to their replicas instead of sending them the document stream. I know the
replicas do that when they get "too far behind", so this would simply mean,
always doing that at some given interval. This would make it possible to
only put replicas into a query load balancer. In the event of a leader
failure, a replica would be promoted and you'd have to deal with it, but
it'd be no worse than what is now steady-state in standard 4x.

Another approach might be to have separate Solr instances point to the same
index directory. One instance is used for indexing and tuned for that, that
other tuned for querying. It's not like having the operations on separate
machines as 3x but it still would be better isolation than standard 4x.
Would this at least work in theory, if say the query instance started up a
new IndexSearcher when necessary?

Any insight, advice or experience on this appreciated.

Mike

--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr4x-Separate-Indexer-and-Query-Instances-for-Performance-tp4045035.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr4x: Separate Indexer and Query Instances for Performance

Reply via email to