Thanks Yonik, I'll try changing the lock type to seeing how that works.

Looking closer at the logs I see the app was started at Oct 28, 2008 9:49:38, 
but not long afterwards it got it's first exception when warming the index:

INFO: [] webapp=/solr path=/update params={} status=0 QTime=3
Oct 28, 2008 9:49:47 PM org.apache.solr.common.SolrException log
SEVERE: Error during auto-warming of key:[EMAIL 
PROTECTED]:java.lang.OutOfMemoryError: Java heap space

2008-10-28 21:49:47.674::INFO:  Shutdown hook executing
Oct 28, 2008 9:49:47 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:915)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:389)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
        at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
        at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
...
Oct 28, 2008 9:49:47 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {commit=} 0 81896
Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={} status=0 QTime=81896
Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore close
INFO: []  CLOSING SolrCore [EMAIL PROTECTED]
Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore closeSearcher
INFO: [] Closing main searcher on request.
Oct 28, 2008 9:49:47 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing 
DirectUpdateHandler2{commits=7,autocommits=0,optimizes=1,docsPending=10,adds=10,deletesById=0,deletesByQuery=0,errors=0,cu
mulative_adds=11511,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=1}
2008-10-28 21:49:48.744::INFO:  Shutdown hook complete
Oct 28, 2008 9:49:52 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {add=[Inta:2113254]} 0 3
Oct 28, 2008 9:49:52 PM org.apache.solr.core.SolrCore execute

Then it seemed to run well for about an hour and I saw this:

Oct 28, 2008 10:38:51 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Oct 28, 2008 10:38:51 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1156 docs 
vs 0 length in bytes of _2rv.fdx
        at 
org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
        at 
org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
        at 
org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
        at 
org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
        at 
org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1774)
...
Oct 28, 2008 10:38:53 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
out: SingleInstanceLock: write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:85)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140)
        at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938)
        at 
org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:116)
        at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
        at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:167)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
        at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)

These lock errors continued for about an hour, before what appears to be a 
successful commit/warm occurs. Then things appear normal for about another 1/2 
hour until the missing index file exceptions below appeared. 

There really should only be one solr process running with access to this index 
in my environment, so I'm puzzled re how two processes could mess up the index. 
The only thing that comes to mind is that I'm running monit to monitor my keep 
processes, including solr. It's set to bounce the port if it seems to be 
struggling. Maybe that bounce isn't happening cleanly and I somehow get 
overlapping process. Seem unlikely, but who knows. I'll look into that too.

Also, I'm in the process of moving solr to another host with more memory, since 
the box I'm on is pretty tight on memory.

thanks!
Bill


----- Original Message ----
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, October 30, 2008 8:52:32 AM
Subject: Re: corrupt solr index on ec2

On Thu, Oct 30, 2008 at 2:06 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
> I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've 
> had some stability issues. It seems like I need to bounce the app once a day. 
> That I could live with and ultimately maybe troubleshoot, but what's more 
> disturbing is that three times in the last 2 weeks my index has been 
> corrupted when FileNotFoundExceptions started to appear.
>
> I'm running in jetty and had my index on the local file system until I lost 
> the index the first time. Then I moved it to my mounted ebs volume so I could 
> restore from a snapshot if needed. I'm wondering if perhaps there are issues 
> with the locking mechanize on either the local directory (which is really a 
> virual instance), or the mounted xfs volume. Has anyone seem this, or have 
> suggestions re the cause? I'm using the single lockType.
>
> I'm running a single solr instance that gets frequent updates from multiple 
> threads, and commits about every hour.
>
> A few things I see in the logs:
>
> - From time to time I see write lock timeouts:
> SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
> out: SingleInstanceLock: write.lock

This is really strange.  It suggests that there is another in-process
writer that is holding the lock.  That should be impossible, unless
it's caused by a previous exception trying to open an IndexWriter and
the lock is simply stale.  What seems to be the first exception that
occurs?

Also, you might try changing the lock type from single to simple to
make it visible cross-process.  That would rule out trying to start
another solr instance on the same index directory.... opening two
writers on the same directory is one way to get missing files like you
appear to have.

> - I've seen OOM exceptions during warming. I've changed 
> maxWarmingSearchers=1, which I suspect will do he trick

OOM errors are really tricky - if they happen in the wrong place, it's
hard to recover gracefully from. Correctly cleaning up after an OOM
error in the IndexWriter recently had some little fixes in lucene
trunk - you might want to try the latest dev version of Lucene and see
if it helps.


-Yonik


> - The finally, this is what I fond in the logs today when the index got 
> corrupt:
>
> Oct 29, 2008 12:18:39 AM org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
> Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
>        at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
>        at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
>        at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>        at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>        at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>        at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>        at org.mortbay.jetty.Server.handle(Server.java:285)
>        at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>        at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>        at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>        at 
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: java.io.FileNotFoundException: 
> /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552)
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582)
>        at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
>        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:77)
>        at 
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:226)
>        at 
> org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:56)
>        at 
> org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27)
>        at 
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:101)
>        at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
>        at 
> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:237)
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:951)
>        ... 25 more
>
>
> Oct 29, 2008 12:18:41 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=500 QTime=1149
> Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
>        at 
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
>        at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
>        at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
>        at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>        at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>        at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>        at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>        at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>        at org.mortbay.jetty.Server.handle(Server.java:285)
>        at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>        at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>        at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>        at 
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> Caused by: java.io.FileNotFoundException: 
> /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552)
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582)
>        at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
>        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:77)
>        at 
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:226)
>        at 
> org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:56)
>        at 
> org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27)
>        at 
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:101)
>        at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
>        at 
> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:237)
>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:951)
>        ... 25 more
>
> [ then things look ok for a while, but these exceptions start to appear ]
>
> Oct 29, 2008 12:41:45 AM org.apache.solr.core.SolrCore execute
> INFO: [] webapp=/solr path=/update params={} status=0 QTime=1291
> Exception in thread "Lucene Merge Thread #0" 
> org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.FileNotFoundException: /var/local/solr/data/production/index/_2rv.fdt 
> (No such file or directory)
>        at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314)
>        at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
> Caused by: java.io.FileNotFoundException: 
> /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
>        at java.io.RandomAccessFile.open(Native Method)
>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552)
>        at 
> org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582)
>        at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
>        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:77)
>        at 
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:258)
>        at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4280)
>        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932)
>        at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
>        at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260)
>
> Any ideas?
>
> thanks,
> Bill



      

Reply via email to