Hi Shawn,

One thing I forget to mention here is the same setup (with no bootstrap) is
working fine in our QA1 environment. I did not have the bootstrap option
from start, I added it thinking it will solve the problem.

Nonetheless I followed Shawn's instructions, wherever it differed from my
old approach...
1. I moved my zkHost from JVM to solr.xml and added chroot in it
2. removed bootstrap option
3. created collections with URL template suggested (I have tried it earlier
too)

None of it worked for me... I am seeing same errors.. I am adding some more
logs before and after the error occurs


-------------------------------------------------

INFO  - 2013-11-02 17:40:40.427;
org.apache.solr.update.DefaultSolrCoreState; closing IndexWriter with
IndexWriterCloser
INFO  - 2013-11-02 17:40:40.428; org.apache.solr.core.SolrCore; [xyz]
Closing main searcher on request.
INFO  - 2013-11-02 17:40:40.431;
org.apache.solr.core.CachingDirectoryFactory; Closing
NRTCachingDirectoryFactory - 1 directories currently being tracked
INFO  - 2013-11-02 17:40:40.432;
org.apache.solr.core.CachingDirectoryFactory; looking to close
/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data
[CachedDir<<refCount=0;path=/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data;done=false>>]
INFO  - 2013-11-02 17:40:40.432;
org.apache.solr.core.CachingDirectoryFactory; Closing directory:
/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data
ERROR - 2013-11-02 17:40:40.433; org.apache.solr.core.CoreContainer; Unable
to create core: xyz
org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:834)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:625)
        at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:256)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:555)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1477)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1589)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
        ... 13 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out:
NativeFSLock@/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data/index/write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:695)
        at 
org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
        at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
        at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:267)
        at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1440)
        ... 15 more
ERROR - 2013-11-02 17:40:40.443; org.apache.solr.common.SolrException;
null:org.apache.solr.common.SolrException: Unable to create core: xyz
        at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:934)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:566)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:834)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:625)
        at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:256)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:555)
        ... 10 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1477)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1589)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
        ... 13 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain
timed out:
NativeFSLock@/mnt/emc/App_name/data-UAT-refresh/SolrCloud/SolrHome2/solr/xyz/data/index/write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:695)
        at 
org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:77)
        at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
        at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:267)
        at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1440)
        ... 15 more

INFO  - 2013-11-02 17:40:40.445; org.apache.solr.servlet.SolrDispatchFilter;
user.dir=/usr/wbol/glassfish3/glassfish/nodes/UAT-refresh-App_name-app02/SolrCloud_02/config
INFO  - 2013-11-02 17:40:40.446; org.apache.solr.servlet.SolrDispatchFilter;
SolrDispatchFilter.init() done
ERROR - 2013-11-02 17:40:40.609; org.apache.solr.update.SolrIndexWriter;
SolrIndexWriter was not closed prior to finalize(), indicates a bug --
POSSIBLE RESOURCE LEAK!!!
ERROR - 2013-11-02 17:40:40.627; org.apache.solr.update.SolrIndexWriter;
Error closing IndexWriter, trying rollback
java.lang.NullPointerException
        at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:984)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:945)
        at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:907)
        at 
org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:132)
        at
org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:185)
        at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
        at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
        at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)
INFO  - 2013-11-02 17:40:41.928;
org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
INFO  - 2013-11-02 17:40:41.928;
org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 4)
INFO  - 2013-11-02 17:40:42.266; org.apache.solr.servlet.SolrDispatchFilter;
[admin] webapp=null path=/admin/cores
params={indexInfo=false&_=1383439243017&wt=json} status=0 QTime=2 
INFO  - 2013-11-02 17:40:42.408; org.apache.solr.servlet.SolrDispatchFilter;
[admin] webapp=null path=/admin/info/system params={_=1383439243093&wt=json}
status=0 QTime=90 
INFO  - 2013-11-02 17:40:43.554; org.apache.solr.servlet.SolrDispatchFilter;
[admin] webapp=null path=/admin/info/logging
params={_=1383439244330&since=0&wt=json} status=0 QTime=14 


------------------------------------------------------------------------

Even if I shut one of the node in the cluster, it does not recover and
throws the same error....

I did some more investigation and compared it with our QA1 environment...

I found out that....

1. when I start the cluster for the first time... and add cores into it...
regardless from Admin console or through URL.... it runs identical in both
QA1 and UAT


2. once I shutdown the cluster.... I see some files in zookeeper overseer
section in QA1 (which is running propertly) but does not show up in UAT

[zkhost:2181(CONNECTED) 3] ls /overseer/queue
[qn-0000004634, qn-0000004633, qn-0000004632, qn-0000004631, qn-0000004630,
qn-0000004616, qn-0000004615, qn-0000004619, qn-0000004618, qn-0000004617,
qn-0000004621, qn-0000004620, qn-0000004623, qn-0000004622, qn-0000004625,
qn-0000004624, qn-0000004627, qn-0000004626, qn-0000004629, qn-0000004628]

These files disappear as soon as the cluster starts... I think these files
help SolrCloud to recover.....

3. Moreover when I shutdown the cluster , the collection in zookeeper under
collections directory disappears in UAT environment but stays in QA1
environment
[zkhost:2181(CONNECTED) 3] ls /collections

4. lifecycle of index write.lock..... in UAT environment

1. when core first gets created - write.lock appears with 2 empty segment
files
2. write.lock stays during dataimport and subsequent searches
3. after cluster shut down - write.lock disappears...
4. during cluster restart write.lock appears again - and eventually becomes
the reason of the error.....

5. sometimes, one or two out of the 4 instances recoveres

I tried upgrading to 4.5.1, cause I saw some overseer related bug fixes in
4.5.1. but that also didnt turn the things right..... I see the same errors
in it too

Should I go back to 4.0/4.4.... did anybody else faced these problems???

Please help..

Thanks,
Kaustubh 

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-tp4098731p4098991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to