Thanks for the report Aaron, this definitely looks like a Lucene bug,
and I've opened
https://issues.apache.org/jira/browse/LUCENE-1995
Can you follow up there (I asked about your index settings).

-Yonik
http://www.lucidimagination.com



On Mon, Oct 19, 2009 at 3:04 PM, Aaron McKee <ucbmc...@gmail.com> wrote:
> I was wondering if anyone might have any insight on the following problem.
> I'm using the latest Solr code from SVN and indexing around 17m XML records
> via DIH. With perfect replicability, the following exception is thrown on
> the same aggregate file (#236, and each XML file has ~50k records), although
> not necessarily the same exact record. Oddly, it doesn't appear to be due to
> anything in the file - if I change the ordering or just index the file
> alone, it works fine.
>
> java.lang.ArrayIndexOutOfBoundsException: -65536
>       at
> org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479)
>       at
> org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502)
>       at
> org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130)
>       at
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467)
>       at
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
>       at
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
>       at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
>       at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
>       at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
>       at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
>       at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
>       at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>       at
> org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75)
>       at
> org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75)
>       at
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
>       at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
>       at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
>       at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
>       at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
>
> The related Lucene code is a bit thick and I'm having a hard time figuring
> out what could be going on here. I've added a bit of debug output to some of
> the intermediary classes and it looks like the exception is generally being
> thrown while processing one of my dynamic fields (type=tdouble, indexed=t,
> stored=f). The GeoUpdateProcessor code referenced above is my own, but
> essentially is the same as the LocalSolr update processor; it just contains
> a few lines of code that calculates a double value from two document fields
> and then stores that value in one of these dynamic fields. It hasn't caused
> any previous problems, only interacts with the underlying framework via
> cmd.geSolrInputDocument(), doc.getFieldValue(string), doc.addField(string,
> double), and next.processAdd(cmd),  and I've generated a number of indexes
> with it in the past, so I don't -think- that's a likely culprit. I've tried
> a run without the update processor and the problem seemed to go away (it
> made it past the above file, at least), but then this changes so many other
> factors that I don't know how much that really tells me (reduces field count
> by ~13 fields, eliminates all dynamic fields, etc.).
>
> The only other thing worth mentioning is that I've replaced the Solr trunk
> Lucene jars with my own compiled versions, based off 2.9.0. The only thing
> different versus the 'stable' release is that it includes a few additional
> libraries (no core or contrib classes were modified). I haven't heard of any
> check-ins between 2.9.0 and 2.9.1-dev that should affect this...
>
> Has anyone else run into a problem like this before?
>
> Thanks,
> Aaron
>
>

Reply via email to