Re: Solr4x: Separate Indexer and Query Instances for Performance

Mark Miller Tue, 05 Mar 2013 15:59:18 -0800

On Mar 5, 2013, at 3:44 PM, Mike Schultz <mike.schu...@gmail.com> wrote:


> Solr 3x had a master/slave architecture which meant that indexing did not
> happen in the same process as querying, in fact normally not even on the
> same machine.  The querier only needed to copy down snapshots of the new
> index files and commit them.  Great isolation for maximum query performance
> and indexing performance.  Now in Solr4x this is gone.  Does anyone have any
> answer or tuning approaches to address this?

No it's not, you still have the old model if you want.

> 
> We have a high query load, high indexing load environment.  I see TP99 query
> latency go from under 100mS to 4-10 seconds during indexing.  Even TP90 hits
> 2 seconds.  Looking at GC in visualVM, I see the a pretty sawtooth turn into
> a scraggily forest when indexing happens and the eden space gets burned
> through.
> 
> It seems like one approach is to have the shard leaders replicate (a la 3x)
> to their replicas instead of sending them the document stream.  I know the
> replicas do that when they get "too far behind", so this would simply mean,
> always doing that at some given interval.  This would make it possible to
> only put replicas into a query load balancer.  In the event of a leader
> failure, a replica would be promoted and you'd have to deal with it, but
> it'd be no worse than what is now steady-state in standard 4x.

You can't really do this without losing important SolrCloud features like 
durability and such. In SolrCloud, replication only happens when a node is in 
recovery mode - during this time it's buffering updates and not involved in 
searches.

> 
> Another approach might be to have separate Solr instances point to the same
> index directory.  One instance is used for indexing and tuned for that, that
> other tuned for querying.  It's not like having the operations on separate
> machines as 3x but it still would be better isolation than standard 4x. 
> Would this at least work in theory, if say the query instance started up a
> new IndexSearcher when necessary?
> 
> Any insight, advice or experience on this appreciated.
> 
> Mike

It's basically a trade off at the moment - use master slave with 4x and get 
this isolation or use SolrCloud and get it's alternate benefits.

One possible future optimization with SolrCloud may be to send pre-analzyed 
docs to the replicas. Just a possibility though.


You look at tuning GC and or other settings to make things better. I think I've 
certainly seen this hold up better than you describe in the past. I'm sure this 
depends on a lot of factors though (data size, hardware, ram etc)

Mark

Re: Solr4x: Separate Indexer and Query Instances for Performance

Reply via email to