Hi Shawn, One thing I forget to mention here is the same setup (with no bootstrap) is working fine in our QA1 environment. I did not have the bootstrap option from start, I added it thinking it will solve the problem.
Nonetheless I followed Shawn's instructions, wherever it differed from my old approach... 1. I moved my zkHost from JVM to solr.xml and added chroot in it 2. removed bootstrap option 3. created collections with URL template suggested (I have tried it earlier too) None of it worked for me... I am seeing same errors.. I am adding some more logs before and after the error occurs ------------------------------------------------- INFO - 2013-11-02 17:40:40.427; org.apache.solr.update.DefaultSolrCoreState; closing IndexWriter with IndexWriterCloser INFO - 2013-11-02 17:40:40.428; org.apache.solr.core.SolrCore; [xyz] Closing main searcher on request. INFO - 2013-11-02 17:40:40.431; org.apache.solr.core.CachingDirectoryFactory; Closing NRTCachingDirectoryFactory - 1 directories currently being tracked INFO - 2013-11-02 17:40:40.432; org.apache.solr.core.CachingDirectoryFactory; looking to close /mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data [CachedDir<<refCount=0;path=/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data;done=false>>] INFO - 2013-11-02 17:40:40.432; org.apache.solr.core.CachingDirectoryFactory; Closing directory: /mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data ERROR - 2013-11-02 17:40:40.433; org.apache.solr.core.CoreContainer; Unable to create core: xyz org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.<init>(SolrCore.java:834) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:625) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:256) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:555) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1477) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1589) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821) ... 13 more Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:695) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:267) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1440) ... 15 more ERROR - 2013-11-02 17:40:40.443; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: Unable to create core: xyz at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:934) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:566) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.<init>(SolrCore.java:834) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:625) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:256) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:555) ... 10 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1477) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1589) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821) ... 13 more Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:695) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77) at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64) at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:267) at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1440) ... 15 more INFO - 2013-11-02 17:40:40.445; org.apache.solr.servlet.SolrDispatchFilter; user.dir=/usr/wbol/glassfish3/glassfish/nodes/UAT-refresh-App_name-app02/SolrCloud_02/config INFO - 2013-11-02 17:40:40.446; org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init() done ERROR - 2013-11-02 17:40:40.609; org.apache.solr.update.SolrIndexWriter; SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! ERROR - 2013-11-02 17:40:40.627; org.apache.solr.update.SolrIndexWriter; Error closing IndexWriter, trying rollback java.lang.NullPointerException at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:984) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:945) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:907) at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:132) at org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:185) at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83) at java.lang.ref.Finalizer.access$100(Finalizer.java:14) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160) INFO - 2013-11-02 17:40:41.928; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) INFO - 2013-11-02 17:40:41.928; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) INFO - 2013-11-02 17:40:42.266; org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null path=/admin/cores params={indexInfo=false&_=1383439243017&wt=json} status=0 QTime=2 INFO - 2013-11-02 17:40:42.408; org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null path=/admin/info/system params={_=1383439243093&wt=json} status=0 QTime=90 INFO - 2013-11-02 17:40:43.554; org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null path=/admin/info/logging params={_=1383439244330&since=0&wt=json} status=0 QTime=14 ------------------------------------------------------------------------ Even if I shut one of the node in the cluster, it does not recover and throws the same error.... I did some more investigation and compared it with our QA1 environment... I found out that.... 1. when I start the cluster for the first time... and add cores into it... regardless from Admin console or through URL.... it runs identical in both QA1 and UAT 2. once I shutdown the cluster.... I see some files in zookeeper overseer section in QA1 (which is running propertly) but does not show up in UAT [zkhost:2181(CONNECTED) 3] ls /overseer/queue [qn-0000004634, qn-0000004633, qn-0000004632, qn-0000004631, qn-0000004630, qn-0000004616, qn-0000004615, qn-0000004619, qn-0000004618, qn-0000004617, qn-0000004621, qn-0000004620, qn-0000004623, qn-0000004622, qn-0000004625, qn-0000004624, qn-0000004627, qn-0000004626, qn-0000004629, qn-0000004628] These files disappear as soon as the cluster starts... I think these files help SolrCloud to recover..... 3. Moreover when I shutdown the cluster , the collection in zookeeper under collections directory disappears in UAT environment but stays in QA1 environment [zkhost:2181(CONNECTED) 3] ls /collections 4. lifecycle of index write.lock..... in UAT environment 1. when core first gets created - write.lock appears with 2 empty segment files 2. write.lock stays during dataimport and subsequent searches 3. after cluster shut down - write.lock disappears... 4. during cluster restart write.lock appears again - and eventually becomes the reason of the error..... 5. sometimes, one or two out of the 4 instances recoveres I tried upgrading to 4.5.1, cause I saw some overseer related bug fixes in 4.5.1. but that also didnt turn the things right..... I see the same errors in it too Should I go back to 4.0/4.4.... did anybody else faced these problems??? Please help.. Thanks, Kaustubh -- View this message in context: http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-tp4098731p4098991.html Sent from the Solr - User mailing list archive at Nabble.com.