Solr 3x had a master/slave architecture which meant that indexing did not happen in the same process as querying, in fact normally not even on the same machine. The querier only needed to copy down snapshots of the new index files and commit them. Great isolation for maximum query performance and indexing performance. Now in Solr4x this is gone. Does anyone have any answer or tuning approaches to address this?
We have a high query load, high indexing load environment. I see TP99 query latency go from under 100mS to 4-10 seconds during indexing. Even TP90 hits 2 seconds. Looking at GC in visualVM, I see the a pretty sawtooth turn into a scraggily forest when indexing happens and the eden space gets burned through. It seems like one approach is to have the shard leaders replicate (a la 3x) to their replicas instead of sending them the document stream. I know the replicas do that when they get "too far behind", so this would simply mean, always doing that at some given interval. This would make it possible to only put replicas into a query load balancer. In the event of a leader failure, a replica would be promoted and you'd have to deal with it, but it'd be no worse than what is now steady-state in standard 4x. Another approach might be to have separate Solr instances point to the same index directory. One instance is used for indexing and tuned for that, that other tuned for querying. It's not like having the operations on separate machines as 3x but it still would be better isolation than standard 4x. Would this at least work in theory, if say the query instance started up a new IndexSearcher when necessary? Any insight, advice or experience on this appreciated. Mike -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4x-Separate-Indexer-and-Query-Instances-for-Performance-tp4045035.html Sent from the Solr - User mailing list archive at Nabble.com.