Hi Ian,

I sent this to java-user, but maybe you didn't see it, so let's try again on solr-user:


It looks like your stored fields file (_X.fdt) is corrupt.

Are you using multiple threads to add docs?

Can you try switching to SerialMergeScheduler to verify it's reproducible?

When you hit this exception, can you stop Solr and then run Lucene's
CheckIndex tool (org.apache.lucene.index.CheckIndex) to verify the
index is corrupt and see which segment it is?  Then post back the
exception and "ls -l" of your index directory?

If you could post the client-side code you're using to build & submit
docs to Solr, and if I can get access to the Medline content, and I
can the repro the bug, then I'll track it down...

Mike

On Aug 14, 2008, at 10:18 PM, Ian Connor wrote:

I seem to be able to reproduce this very easily and the data is
medline (so I am sure I can share it if needed with a quick email to
check).

- I am using fedora:
%uname -a
Linux ghetto5.projectlounge.com 2.6.23.1-42.fc8 #1 SMP Tue Oct 30
13:18:33 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
%java -version
java version "1.7.0"
IcedTea Runtime Environment (build 1.7.0-b21)
IcedTea 64-Bit Server VM (build 1.7.0-b21, mixed mode)
- single core (will use shards but each machine just as one HDD so
didn't see how cores would help but I am new at this)
- next run I will keep the output to check for earlier errors
- very and I can share code + data if that will help

On Thu, Aug 14, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
Yikes... not good.  This shouldn't be due to anything you did wrong
Ian... it looks like a lucene bug.

Some questions:
- what platform are you running on, and what JVM?
- are you using multicore? (I fixed some index locking bugs recently)
- are there any exceptions in the log before this?
- how reproducible is this?

-Yonik

On Thu, Aug 14, 2008 at 2:47 PM, Ian Connor <[EMAIL PROTECTED]> wrote:
Hi,

I have rebuilt my index a few times (it should get up to about 4
Million but around 1 Million it starts to fall apart).

Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
at org .apache .lucene .index .ConcurrentMergeScheduler .handleMergeException(ConcurrentMergeScheduler.java:323) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:300)
Caused by: java.lang.IndexOutOfBoundsException: Index: 105, Size: 33
      at java.util.ArrayList.rangeCheck(ArrayList.java:572)
      at java.util.ArrayList.get(ArrayList.java:350)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:188) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java: 670) at org .apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 349) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 3998) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3650) at org .apache .lucene .index .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:214) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:269)


When this happens, the disk usage goes right up and the indexing
really starts to slow down. I am using a Solr build from about a week
ago - so my Lucene is at 2.4 according to the war files.

Has anyone seen this error before? Is it possible to tell which Array is too large? Would it be an Array I am sending in or another internal
one?

Regards,
Ian Connor





--
Regards,

Ian Connor

Reply via email to