Hi Shawn, Answers to your questions.
1.Yes we are aware of fault tolerance in our architecture,but its our dev env,so we are working with solrCloud mode with limited machines. 2. Solr is running as separate app,its not on weblogic. We are using Weblogic for rest services which further connect to zookeeper<-->Solr. 3.We used jconsole to monitor solr,zookeeper and weblogic process.In the weblogic process looks like threads are getting stuck. One such thread related to zookeeper is as below.. Name: zkConnectionManagerCallback-9207-thread-1 State: WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@396cda76 Total blocked: 0 Total waited: 1 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) Have attached file containing snapshots of process. Also attached the solr GCeasy-report-gc.pdf <http://lucene.472066.n3.nabble.com/file/t493329/GCeasy-report-gc.pdf> gc log report TimoutIssue.docx <http://lucene.472066.n3.nabble.com/file/t493329/TimoutIssue.docx> of solr during the load activity. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html