Context: Solr/Lucene 5.1 Adding documents to Solr core/index through SolrJ I extract pdf's using tika. The pdf-content is one of the fields of my SolrDocuments that are transmitted to Solr using SolrJ. As not all documents seem to be "coming through" I looked into the Solr-logs and see the follwoing exceptions: org.apache.solr.common.SolrException: Exception writing document id fustusermanuals#4614 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:170) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085) ... at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="content__s_i_suggest" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[10, 32, 10, 32, 10, 10, 70, 82, 32, 77, 111, 100, 101, 32, 100, 39, 101, 109, 112, 108, 111, 105, 32, 10, 10, 32, 10, 10, 32, 10]...', original message: bytes can be at most 32766 in length; got 186493 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) ... 40 more Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 186493 at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154) at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657) ... 47 more
How can I tell Solr/SolrJ to allow more payload? I also see some org.apache.solr.common.SolrException: Exception writing document id fustusermanuals#3323 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:170) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697) ... at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="content__s_i_suggest" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[10, 69, 78, 32, 76, 67, 68, 32, 116, 101, 108, 101, 118, 105, 115, 105, 111, 110, 10, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95]...', original message: bytes can be at most 32766 in length; got 164683 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166) ... 40 more Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 164683 at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154) at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657) ... 47 more Which seem result from the same "limitation" Unfortunately I must extract the pdfs in the my client Thx Clemens