I was wondering if anyone might have any insight on the following problem. I'm using the latest Solr code from SVN and indexing around 17m XML records via DIH. With perfect replicability, the following exception is thrown on the same aggregate file (#236, and each XML file has ~50k records), although not necessarily the same exact record. Oddly, it doesn't appear to be due to anything in the file - if I change the ordering or just index the file alone, it works fine.

java.lang.ArrayIndexOutOfBoundsException: -65536
at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:479) at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:502) at org.apache.lucene.index.FreqProxTermsWriterPerField.addTerm(FreqProxTermsWriterPerField.java:130) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:467) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.ask_geo.update.GeoUpdateProcessor.processAdd(GeoUpdateProcessor.java:75) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:75) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:292) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:392) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

The related Lucene code is a bit thick and I'm having a hard time figuring out what could be going on here. I've added a bit of debug output to some of the intermediary classes and it looks like the exception is generally being thrown while processing one of my dynamic fields (type=tdouble, indexed=t, stored=f). The GeoUpdateProcessor code referenced above is my own, but essentially is the same as the LocalSolr update processor; it just contains a few lines of code that calculates a double value from two document fields and then stores that value in one of these dynamic fields. It hasn't caused any previous problems, only interacts with the underlying framework via cmd.geSolrInputDocument(), doc.getFieldValue(string), doc.addField(string, double), and next.processAdd(cmd), and I've generated a number of indexes with it in the past, so I don't -think- that's a likely culprit. I've tried a run without the update processor and the problem seemed to go away (it made it past the above file, at least), but then this changes so many other factors that I don't know how much that really tells me (reduces field count by ~13 fields, eliminates all dynamic fields, etc.).

The only other thing worth mentioning is that I've replaced the Solr trunk Lucene jars with my own compiled versions, based off 2.9.0. The only thing different versus the 'stable' release is that it includes a few additional libraries (no core or contrib classes were modified). I haven't heard of any check-ins between 2.9.0 and 2.9.1-dev that should affect this...

Has anyone else run into a problem like this before?

Thanks,
Aaron

Reply via email to