It looks like it was just RAM. I purchased a PHD PCI2 to test all my RAM from Ultra-X and some modules were just plain bad (some were bad right away and others needed to warm up before failing - I will testing all my RAM from now on).
I have re-index this many times since then and never seen the problem since. So, it looks like it was just bad hardware - sorry about the confusion. On Mon, Aug 18, 2008 at 8:29 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > OK gotchya. Please keep us posted one way or another... > > Mike > > Ian Connor wrote: > >> Hi Mike, >> >> I am currently ruling out some bad memory modules. Knowing that this >> is a index corruption, makes memory corruption more likely. If >> replacing RAM does not fix the problem (which I need to do anyway due >> to segmentation faults), I will package up the crash into a >> reproducible scenario. >> >> On Mon, Aug 18, 2008 at 5:56 AM, Michael McCandless >> <[EMAIL PROTECTED]> wrote: >>> >>> Hi Ian, >>> >>> I sent this to java-user, but maybe you didn't see it, so let's try again >>> on >>> solr-user: >>> >>> >>> It looks like your stored fields file (_X.fdt) is corrupt. >>> >>> Are you using multiple threads to add docs? >>> >>> Can you try switching to SerialMergeScheduler to verify it's >>> reproducible? >>> >>> When you hit this exception, can you stop Solr and then run Lucene's >>> CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the >>> index is corrupt and see which segment it is? Then post back the >>> exception and "ls -l" of your index directory? >>> >>> If you could post the client-side code you're using to build & submit >>> docs to Solr, and if I can get access to the Medline content, and I >>> can the repro the bug, then I'll track it down... >>> >>> Mike >>> >>> On Aug 14, 2008, at 10:18 PM, Ian Connor wrote: >>> >>>> I seem to be able to reproduce this very easily and the data is >>>> medline (so I am sure I can share it if needed with a quick email to >>>> check). >>>> >>>> - I am using fedora: >>>> %uname -a >>>> Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30 >>>> 13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux >>>> %java -version >>>> java version "1.7.0" >>>> IcedTea Runtime Environment (build 1.7.0-b21) >>>> IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode) >>>> - single core (will use shards but each machine just as one HDD so >>>> didn't see how cores would help but I am new at this) >>>> - next run I will keep the output to check for earlier errors >>>> - very and I can share code + data if that will help >>>> >>>> On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >>>>> >>>>> Yikes... not good. This shouldn't be due to anything you did wrong >>>>> Ian... it looks like a lucene bug. >>>>> >>>>> Some questions: >>>>> - what platform are you running on, and what JVM? >>>>> - are you using multicore? (I fixed some index locking bugs recently) >>>>> - are there any exceptions in the log before this? >>>>> - how reproducible is this? >>>>> >>>>> -Yonik >>>>> >>>>> On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have rebuilt my index a few times (it should get up to about 4 >>>>>> Million but around 1 Million it starts to fall apart). >>>>>> >>>>>> Exception in thread "Lucene Merge Thread #0" >>>>>> org.apache.lucene.index.MergePolicy$MergeException: >>>>>> java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 >>>>>> at >>>>>> >>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:323) >>>>>> at >>>>>> >>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:300) >>>>>> Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33 >>>>>> at java.util.ArrayList.rangeCheck(ArrayList.java:572) >>>>>> at java.util.ArrayList.get(ArrayList.java:350) >>>>>> at >>>>>> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) >>>>>> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188) >>>>>> at >>>>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:670) >>>>>> at >>>>>> >>>>>> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:349) >>>>>> at >>>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) >>>>>> at >>>>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3998) >>>>>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650) >>>>>> at >>>>>> >>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:214) >>>>>> at >>>>>> >>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:269) >>>>>> >>>>>> >>>>>> When this happens, the disk usage goes right up and the indexing >>>>>> really starts to slow down. I am using a Solr build from about a week >>>>>> ago - so my Lucene is at 2.4 according to the war files. >>>>>> >>>>>> Has anyone seen this error before? Is it possible to tell which Array >>>>>> is too large? Would it be an Array I am sending in or another internal >>>>>> one? >>>>>> >>>>>> Regards, >>>>>> Ian Connor >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> >>>> Ian Connor >>> >>> >> >> >> >> -- >> Regards, >> >> Ian Connor > > -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 6333372 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor