Thanks for the report Aaron, this definitely looks like a Lucene bug, and I've opened https://issues.apache.org/jira/browse/LUCENE-1995 Can you follow up there (I asked about your index settings).
-Yonik http://www.lucidimagination.com On Mon, Oct 19, 2009 at 3:04 PM, Aaron McKee <ucbmc...@gmail.com> wrote: > I was wondering if anyone might have any insight on the following problem. > I'm using the latest Solr code from SVN and indexing around 17m XML records > via DIH. With perfect replicability, the following exception is thrown on > the same aggregate file (#236, and each XML file has ~50k records), although > not necessarily the same exact record. Oddly, it doesn't appear to be due to > anything in the file - if I change the ordering or just index the file > alone, it works fine. > > java.lang.ArrayIndexOutOfBoundsException: -65536 > at > org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479) > at > org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130) > at > org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467) > at > org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) > at > org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) > at > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) > at > org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) > at > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) > at > org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75) > at > org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) > at > org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) > > The related Lucene code is a bit thick and I'm having a hard time figuring > out what could be going on here. I've added a bit of debug output to some of > the intermediary classes and it looks like the exception is generally being > thrown while processing one of my dynamic fields (type=tdouble, indexed=t, > stored=f). The GeoUpdateProcessor code referenced above is my own, but > essentially is the same as the LocalSolr update processor; it just contains > a few lines of code that calculates a double value from two document fields > and then stores that value in one of these dynamic fields. It hasn't caused > any previous problems, only interacts with the underlying framework via > cmd.geSolrInputDocument(), doc.getFieldValue(string), doc.addField(string, > double), and next.processAdd(cmd), and I've generated a number of indexes > with it in the past, so I don't -think- that's a likely culprit. I've tried > a run without the update processor and the problem seemed to go away (it > made it past the above file, at least), but then this changes so many other > factors that I don't know how much that really tells me (reduces field count > by ~13 fields, eliminates all dynamic fields, etc.). > > The only other thing worth mentioning is that I've replaced the Solr trunk > Lucene jars with my own compiled versions, based off 2.9.0. The only thing > different versus the 'stable' release is that it includes a few additional > libraries (no core or contrib classes were modified). I haven't heard of any > check-ins between 2.9.0 and 2.9.1-dev that should affect this... > > Has anyone else run into a problem like this before? > > Thanks, > Aaron > >