On Sat, Aug 16, 2008 at 4:33 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> What version of Java do you have on Linux?

The Java version on *Linux* (where I'm seeing the trouble):

    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm pretty sure this is the latest one from the Ubuntu repository.

Maybe I should try the official Sun HotSpot build instead. I'm not
finding any complaints about OpenJDK on the Lucene list, though.

The Java version on *Windows* (where I created the initial compound
format index) is an official Sun build:

    java version "1.6.0_06"
    Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
    Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode, sharing)

> Also, is this easily reproducible?  How many threads are you adding
> documents with?  What is your Auto Commit setting?

I think it takes 12-24hr to get the index to screw up, so while I did
reproduce it once, I haven't yet tried again. Intuition says that if I
repeat the same procedure the same problem would arise. Of course,
what would be nice is if I could figure out how to reproduce it more
quickly, with a smaller index, and a simpler schema.

I'm adding documents with 5-10 threads. Since I'm using the rich
document update handler
(https://issues.apache.org/jira/browse/SOLR-284), there's going to be
PDF and HTML conversion going on within Solr alongside the normal
analysis and indexing.

Autocommit is:

    <autoCommit>
      <maxDocs>100000</maxDocs>
      <maxTime>1800000</maxTime>  <!-- 30 min -->
    </autoCommit>

> Can you try Lucene's CheckIndex tool on it and report what it says?

Working on that now. It should take some time, though, due to the index size.

>
> On Aug 15, 2008, at 1:35 PM, Chris Harris wrote:
>
>> I have an index (different from the ones mentioned yesterday) that was
>> working fine with 3M docs or so, but when I added a bunch more docs,
>> bringing it closer to 4M docs, the index seemed to get corrupted. In
>> particular, now when I start Solr up, or when when my indexing process
>> tries add a document, I get a complaint about missing index files.
>>
>> The error on startup looks like this:
>>
>> <record>
>>  <date>2008-08-15T10:18:54</date>
>>  <millis>1218820734592</millis>
>>  <sequence>92</sequence>
>>  <logger>org.apache.solr.core.MultiCore</logger>
>>  <level>SEVERE</level>
>>  <class>org.apache.solr.common.SolrException</class>
>>  <method>log</method>
>>  <thread>10</thread>
>>  <message>java.lang.RuntimeException: java.io.FileNotFoundException:
>> /ssd/solr-9999/solr/exhibitcore/data/index/_p7.fdt (No such file or
>> directory)
>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:733)
>>        at org.apache.solr.core.SolrCore.&lt;init&gt;(SolrCore.java:387)
>>        at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
>>        at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:75)
>>        at
>> org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>>        at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>        at
>> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
>>        at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
>>        at
>> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
>>        at
>> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
>>        at
>> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
>>        at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>        at
>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>>        at
>> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
>>        at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>        at
>> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>>        at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>        at
>> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
>>        at org.mortbay.jetty.Server.doStart(Server.java:210)
>>        at
>> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>>        at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:616)
>>        at org.mortbay.start.Main.invokeMain(Main.java:183)
>>        at org.mortbay.start.Main.start(Main.java:497)
>>        at org.mortbay.start.Main.main(Main.java:115)
>> Caused by: java.io.FileNotFoundException:
>> /ssd/solr-9999/solr/exhibitcore/data/index/_p7.fdt (No such file or
>> directory)
>>        at java.io.RandomAccessFile.open(Native Method)
>>        at java.io.RandomAccessFile.&lt;init&gt;(RandomAccessFile.java:233)
>>        at
>> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.&lt;init&gt;(FSDirectory.java:506)
>>        at
>> org.apache.lucene.store.FSDirectory$FSIndexInput.&lt;init&gt;(FSDirectory.java:536)
>>        at
>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
>>        at
>> org.apache.lucene.index.FieldsReader.&lt;init&gt;(FieldsReader.java:75)
>>        at
>> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
>>        at
>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
>>        at
>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
>>        at
>> org.apache.lucene.index.MultiSegmentReader.&lt;init&gt;(MultiSegmentReader.java:55)
>>        at
>> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
>>        at
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
>>        at
>> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>>        at
>> org.apache.solr.search.SolrIndexSearcher.&lt;init&gt;(SolrIndexSearcher.java:93)
>>        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:724)
>>        ... 29 more
>> </message>
>> </record>
>>
>> And the error on doc add looks like this:
>>
>> <record>
>>  <date>2008-08-15T09:51:30</date>
>>  <millis>1218819090142</millis>
>>  <sequence>6571937</sequence>
>>  <logger>org.apache.solr.core.SolrCore</logger>
>>  <level>SEVERE</level>
>>  <class>org.apache.solr.common.SolrException</class>
>>  <method>log</method>
>>  <thread>14</thread>
>>  <message>java.io.FileNotFoundException:
>> /ssd/solr-9999/solr/exhibitcore/data/index/_p7.fdt (No such file or
>> directory)
>>        at java.io.RandomAccessFile.open(Native Method)
>>        at java.io.RandomAccessFile.&lt;init&gt;(RandomAccessFile.java:233)
>>        at
>> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.&lt;init&gt;(FSDirectory.java:506)
>>        at
>> org.apache.lucene.store.FSDirectory$FSIndexInput.&lt;init&gt;(FSDirectory.java:536)
>>        at
>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
>>        at
>> org.apache.lucene.index.FieldsReader.&lt;init&gt;(FieldsReader.java:75)
>>        at
>> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:308)
>>        at
>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
>>        at
>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:197)
>>        at
>> org.apache.lucene.index.MultiSegmentReader.&lt;init&gt;(MultiSegmentReader.java:55)
>>        at
>> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:75)
>>        at
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
>>        at
>> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
>>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
>>        at
>> org.apache.solr.search.SolrIndexSearcher.&lt;init&gt;(SolrIndexSearcher.java:93)
>>        at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:213)
>>        at
>> org.apache.solr.update.DirectUpdateHandler2.openSearcher(DirectUpdateHandler2.java:207)
>>        at
>> org.apache.solr.update.DirectUpdateHandler2.doDeletions(DirectUpdateHandler2.java:466)
>>        at
>> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:295)
>>        at
>> org.apache.solr.handler.RichDocumentLoader.doAdd(RichDocumentRequestHandler.java:231)
>>        at
>> org.apache.solr.handler.RichDocumentLoader.addDoc(RichDocumentRequestHandler.java:236)
>>        at
>> org.apache.solr.handler.RichDocumentLoader.load(RichDocumentRequestHandler.java:278)
>>        at
>> org.apache.solr.handler.RichDocumentRequestHandler.handleRequestBody(RichDocumentRequestHandler.java:80)
>>        at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
>>        at
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:339)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:274)
>>        at
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>>        at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>>        at
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>        at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>        at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>>        at
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>>        at
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>>        at
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>        at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>>        at org.mortbay.jetty.Server.handle(Server.java:285)
>>        at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>>        at
>> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>>        at
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>>        at
>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>> </message>
>> </record>
>>
>> I just checked, and the files that Solr is complaining about are
>> indeed not in the index directory.
>>
>> The earliest indication of trouble I found in my log was an error like
>> this:
>>
>> <record>
>>  <date>2008-08-15T09:47:48</date>
>>  <millis>1218818868528</millis>
>>  <sequence>6525387</sequence>
>>  <logger>org.apache.solr.update.UpdateHandler</logger>
>>  <level>SEVERE</level>
>>  <class>org.apache.solr.update.DirectUpdateHandler2$CommitTracker</class>
>>  <method>run</method>
>>  <thread>15</thread>
>>  <message>auto commit error...</message>
>> </record>
>>
>> There may have been SEVERE errors before this, but my log doesn't go
>> back to the very beginning.
>>
>> It's interesting that while adding documents seems to be usually
>> failing now (yielding the "file not found" exception), I could add
>> documents successfully for some time before things started to go
>> wrong. What's more, some documents do seem to *still* get added
>> successfully. I'm using the rich document update handler, so the
>> successful log entries look like this:
>>
>> <record>
>>  <date>2008-08-15T09:50:54</date>
>>  <millis>1218819054600</millis>
>>  <sequence>6561534</sequence>
>>  <logger>org.apache.solr.core.SolrCore</logger>
>>  <level>INFO</level>
>>  <class>org.apache.solr.core.SolrCore</class>
>>  <method>execute</method>
>>  <thread>14</thread>
>>  <message>[exhibitcore] webapp=/solr path=/update/rich
>>
>> params={filenumber=333-112076-85&amp;formtype=S-4/A&amp;stream.fieldname=body&amp;exhibittype=EX-3.99&amp;date=2004-02-09T00:00:00Z&amp;companyname=PROGRESSIVE+VENTURE+CAPITAL+CORP&amp;exhibitdescription=EXHIBIT+3.99&amp;id=37684831&amp;cik=1275089&amp;stream.type=html&amp;filingkey=0001193125-04-017196/1275089/FILER&amp;stateofincorporation=WV&amp;fieldnames=key,filingkey,companyname,accessionnumber,cik,date,exhibitdescription,exhibittype,exhibittypeint,filenumber,filename,formtype,stateofheadquarters,stateofincorporation&amp;filename=dex399.htm&amp;exhibittypeint=3&amp;accessionnumber=0001193125-04-017196&amp;stateofheadquarters=~&amp;key=0001193125-04-017196/1275089/FILER/dex399.htm}
>> status=0 QTime=9 </message>
>> </record>
>>
>> The deletes I'm seeing in my log also seem to be working fine; I get
>> log entries like
>>
>> <record>
>>  <date>2008-08-15T09:50:54</date>
>>  <millis>1218819054602</millis>
>>  <sequence>6561535</sequence>
>>  <logger>org.apache.solr.update.processor.UpdateRequestProcessor</logger>
>>  <level>INFO</level>
>>  <class>org.apache.solr.update.processor.LogUpdateProcessor</class>
>>  <method>finish</method>
>>  <thread>14</thread>
>>  <message>{delete=[0001193125-04-017196/1275096/FILER/dex231.htm]} 0
>> 1</message>
>> </record>
>>
>> and
>>
>> <record>
>>  <date>2008-08-15T09:51:30</date>
>>  <millis>1218819090153</millis>
>>  <sequence>6571944</sequence>
>>  <logger>org.apache.solr.update.UpdateHandler</logger>
>>  <level>INFO</level>
>>  <class>org.apache.solr.update.DirectUpdateHandler2</class>
>>  <method>doDeletions</method>
>>  <thread>13</thread>
>>  <message>DirectUpdateHandler2 deleting and removing dups for 100788
>> ids</message>
>> </record>
>>
>> After I noticed this corruption thing, I thought I'd see if I could
>> get it to happen again, so I went back to the original 3M-ish doc
>> index, and tried adding the new documents again. (If it matters, the
>> new docs would have come into the index in a different permutation on
>> this retry.) This too resulted in an index with "file not found"
>> problems.
>>
>> The following may or may not be relevant: I built the base 3M-ish doc
>> index on a Windows machine, and it's a compound (.cfs) format index.
>> (I actually created it not with Solr, but by using the index merging
>> tool that comes with Lucene in order to merge three different
>> non-compound format indexes that I'd previously made with Solr into a
>> single index.) Before I started adding documents, I moved the index to
>> a Linux machine running a newer version of Solr/Lucene than was on the
>> Windows machine. The stuff described above all happened on Linux.
>>
>> Any thoughts?
>>
>> Thanks a bunch,
>> Chris
>
>

Reply via email to