Thanks Yonik, I'll try changing the lock type to seeing how that works. Looking closer at the logs I see the app was started at Oct 28, 2008 9:49:38, but not long afterwards it got it's first exception when warming the index:
INFO: [] webapp=/solr path=/update params={} status=0 QTime=3 Oct 28, 2008 9:49:47 PM org.apache.solr.common.SolrException log SEVERE: Error during auto-warming of key:[EMAIL PROTECTED]:java.lang.OutOfMemoryError: Java heap space 2008-10-28 21:49:47.674::INFO: Shutdown hook executing Oct 28, 2008 9:49:47 PM org.apache.solr.common.SolrException log SEVERE: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:915) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:389) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) ... Oct 28, 2008 9:49:47 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} 0 81896 Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=0 QTime=81896 Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore close INFO: [] CLOSING SolrCore [EMAIL PROTECTED] Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore closeSearcher INFO: [] Closing main searcher on request. Oct 28, 2008 9:49:47 PM org.apache.solr.update.DirectUpdateHandler2 close INFO: closing DirectUpdateHandler2{commits=7,autocommits=0,optimizes=1,docsPending=10,adds=10,deletesById=0,deletesByQuery=0,errors=0,cu mulative_adds=11511,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=1} 2008-10-28 21:49:48.744::INFO: Shutdown hook complete Oct 28, 2008 9:49:52 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[Inta:2113254]} 0 3 Oct 28, 2008 9:49:52 PM org.apache.solr.core.SolrCore execute Then it seemed to run well for about an hour and I saw this: Oct 28, 2008 10:38:51 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Oct 28, 2008 10:38:51 PM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1156 docs vs 0 length in bytes of _2rv.fdx at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94) at org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83) at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47) at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367) at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1774) ... Oct 28, 2008 10:38:53 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SingleInstanceLock: write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:167) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196) These lock errors continued for about an hour, before what appears to be a successful commit/warm occurs. Then things appear normal for about another 1/2 hour until the missing index file exceptions below appeared. There really should only be one solr process running with access to this index in my environment, so I'm puzzled re how two processes could mess up the index. The only thing that comes to mind is that I'm running monit to monitor my keep processes, including solr. It's set to bounce the port if it seems to be struggling. Maybe that bounce isn't happening cleanly and I somehow get overlapping process. Seem unlikely, but who knows. I'll look into that too. Also, I'm in the process of moving solr to another host with more memory, since the box I'm on is pretty tight on memory. thanks! Bill ----- Original Message ---- From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, October 30, 2008 8:52:32 AM Subject: Re: corrupt solr index on ec2 On Thu, Oct 30, 2008 at 2:06 AM, Bill Graham <[EMAIL PROTECTED]> wrote: > I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've > had some stability issues. It seems like I need to bounce the app once a day. > That I could live with and ultimately maybe troubleshoot, but what's more > disturbing is that three times in the last 2 weeks my index has been > corrupted when FileNotFoundExceptions started to appear. > > I'm running in jetty and had my index on the local file system until I lost > the index the first time. Then I moved it to my mounted ebs volume so I could > restore from a snapshot if needed. I'm wondering if perhaps there are issues > with the locking mechanize on either the local directory (which is really a > virual instance), or the mounted xfs volume. Has anyone seem this, or have > suggestions re the cause? I'm using the single lockType. > > I'm running a single solr instance that gets frequent updates from multiple > threads, and commits about every hour. > > A few things I see in the logs: > > - From time to time I see write lock timeouts: > SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed > out: SingleInstanceLock: write.lock This is really strange. It suggests that there is another in-process writer that is holding the lock. That should be impossible, unless it's caused by a previous exception trying to open an IndexWriter and the lock is simply stale. What seems to be the first exception that occurs? Also, you might try changing the lock type from single to simple to make it visible cross-process. That would rule out trying to start another solr instance on the same index directory.... opening two writers on the same directory is one way to get missing files like you appear to have. > - I've seen OOM exceptions during warming. I've changed > maxWarmingSearchers=1, which I suspect will do he trick OOM errors are really tricky - if they happen in the wrong place, it's hard to recover gracefully from. Correctly cleaning up after an OOM error in the IndexWriter recently had some little fixes in lucene trunk - you might want to try the latest dev version of Lucene and see if it helps. -Yonik > - The finally, this is what I fond in the logs today when the index got > corrupt: > > Oct 29, 2008 12:18:39 AM org.apache.solr.update.DirectUpdateHandler2 commit > INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) > Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log > SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: > /var/local/solr/data/production/index/_2rv.fdt (No such file or directory) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77) > at > org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226) > at > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > Caused by: java.io.FileNotFoundException: > /var/local/solr/data/production/index/_2rv.fdt (No such file or directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) > at > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552) > at > org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582) > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488) > at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:77) > at > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:226) > at > org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:56) > at > org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27) > at > org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:101) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) > at > org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:237) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:951) > ... 25 more > > > Oct 29, 2008 12:18:41 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/update params={} status=500 QTime=1149 > Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log > SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: > /var/local/solr/data/production/index/_2rv.fdt (No such file or directory) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77) > at > org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226) > at > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > Caused by: java.io.FileNotFoundException: > /var/local/solr/data/production/index/_2rv.fdt (No such file or directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) > at > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552) > at > org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582) > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488) > at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:77) > at > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:226) > at > org.apache.lucene.index.MultiSegmentReader.<init>(MultiSegmentReader.java:56) > at > org.apache.lucene.index.ReadOnlyMultiSegmentReader.<init>(ReadOnlyMultiSegmentReader.java:27) > at > org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:101) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) > at > org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:237) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:951) > ... 25 more > > [ then things look ok for a while, but these exceptions start to appear ] > > Oct 29, 2008 12:41:45 AM org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/update params={} status=0 QTime=1291 > Exception in thread "Lucene Merge Thread #0" > org.apache.lucene.index.MergePolicy$MergeException: > java.io.FileNotFoundException: /var/local/solr/data/production/index/_2rv.fdt > (No such file or directory) > at > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) > Caused by: java.io.FileNotFoundException: > /var/local/solr/data/production/index/_2rv.fdt (No such file or directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212) > at > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552) > at > org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582) > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488) > at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:77) > at > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:258) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4280) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3932) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:260) > > Any ideas? > > thanks, > Bill