Hi,

We are using SOLR 7.4 and with one of our customers we are encountering the same problem as described on StackOverflow https://stackoverflow.com/questions/52783491/solr-indexing-error-possible-analysis-error (no solution provided).

From our side what's happening is the following : while indexing documents with SOLR 7.4 this exception appears randomly. It is very annoying because it cannot be reproduced on our development machine and we don't have access to customer's data. So far it has happened on pdf, xls, or eml documents (but never the same document is involved). It looks like the issues stems from /FlattenGraphFilter/ class and specifically /restoreState(inputNode.tokens.get(inputNode.nextOut))/
The complete stack trace shows :

/2018-12-20 22:09:04.140 ERROR (qtp1990098664-22) [   x:EM] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 0acb5bf3-4caa-4f31-ab8f-76a8ea7ce782 to the index; possible analysis error.// //    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:246)// //    at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:950)// //    at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1168)// //    at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633)// //    at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)// //    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)// //    at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92)// //    at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)// //    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)// //    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)// //    at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)// //    at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)// //    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)// //    at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)// //    at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)// //    at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)// //    at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)// //    at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)// //    at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)// //    at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)// //    at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)// //    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)//
//    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)//
//    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)//
//    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)//
//    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)// //    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)// //    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)// //    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)// //    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)// //    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)// //    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)// //    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)// //    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)// //    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)// //    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)// //    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)// //    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)// //    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)// //    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)// //    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)// //    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)// //    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)// //    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)// //    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)// //    at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)// //    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)//
//    at org.eclipse.jetty.server.Server.handle(Server.java:531)//
//    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)//
//    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)// //    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)//
//    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)//
//    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)// //    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)// //    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)// //    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)// //    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)// //    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)// //    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)// //    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)//
//    at java.lang.Thread.run(Thread.java:748)//
//Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1//
//    at java.util.ArrayList.rangeCheck(ArrayList.java:657)//
//    at java.util.ArrayList.get(ArrayList.java:433)//
//    at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:204)// //    at org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)// //    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:738)// //    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)// //    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)// //    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)// //    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)// //    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1602)// //    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)// //    at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:982)// //    at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971)// //    at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348)// //    at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284)// //    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234)//
//    ... 76 more/

contractions_fr.txt, stopwords_fr.txt are the stock ones, synonyms_fr.txt contains :
voiture,renault,peugeot

The schema used in managed-schema is :

/<dynamicField name="*_txt_fra" type="text_fr"  indexed="true"  stored="false"/>// //    <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">//
//        <analyzer type="index">//
//            <tokenizer class="solr.WhitespaceTokenizerFactory"/>//
//            <!-- removes l', etc -->//
//            <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/>//
//            <!-- Separates on hyphen numbers, ... -->//
//            <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> //
//            <filter class="solr.LowerCaseFilterFactory"/>//
//            <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt" format="snowball" />//
//            <filter class="solr.FrenchLightStemFilterFactory"/>//
//
//            <filter class="solr.FlattenGraphFilterFactory" />//
//        </analyzer>//
//        <analyzer type="query">//
//            // query analyzer omitted for clearness sake//
//        </analyzer>//
//    </fieldType>/

As opposed to the StackOverflow question, we only use a single FlattenGraphFilter in the end of the analyzer (as stated by Michael McCandless-2 <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=50893>in http://lucene.472066.n3.nabble.com/SynonymFilterFactory-deprecated-td4360455.html "You need only one FlattenGraphFilter at the end of your analysis chain. ")

So where should we have a look at to prevent this exception from happening  ?

Thank you very much in advance,

Best
Remi

Reply via email to