Hi Solr Gurus,

We have solr in 1 master, 2 slave configuration. Snapshot is created post 
commit, post optimization. We have autocommit after 50 documents or 5 minutes. 
Snapshot puller runs as a cron every 10 minutes. What we have observed is that 
whenever snapshot is installed on the slave, we see solrj client used to query 
slave solr, gets timedout and there is high CPU usage/load avg. on slave 
server. If we stop snapshot puller, then slaves work with no issues. The system 
has been running since 2 months and this issue has started to occur only now  
when load on website is increasing.

Following are some details:

Solr Details:
apache-solr Version: 1.3.0
Lucene - 2.4-dev

Master/Slave configurations:

Master:
- for indexing data HTTPRequests are made on Solr server.
- autocommit feature is enabled for 50 docs and 5 minutes
- caching params are disable for this server
- mergeFactor of 10 is set
- we were running optimize script after every 2 hours, but now have reduced the 
duration to twice a day but issue still persists

Slave1/Slave2:
- standard requestHandler is being used
- default values of caching are set
Machine Specifications:

Master:
- 4GB RAM
- 1GB JVM Heap memory is allocated to Solr

Slave1/Slave2:
- 4GB RAM
- 2GB JVM Heap memory is allocated to Solr

Master and Slave1 (solr1)are on single box and Slave2(solr2) on different box. 
We use HAProxy to load balance query requests between 2 slaves. Master is only 
used for indexing.
Please let us know if somebody has ever faced similar kind of issue or has some 
insight into it as we guys are literally struck at the moment with a very 
unstable production environment.

As a workaround, we have started running optimize on master every 7 minutes. 
This seems to have reduced the severity of the problem but still issue occurs 
every 2days now. please suggest what could be the root cause of this.

Thanks,
Bipul




Reply via email to