Stupid me (yet again):
Should have taken a  TEXT instead of (only) a STRING field for the content ;)

Another question I have though (which fits the subject even better):
In the log I see many
org.apache.solr.common.SolrException: missing content stream
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
...
        at org.eclipse.jetty.server.Server.handle(Server.java:368)
        at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
        at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
        at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
        at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
        at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Unknown Source)

What are possible reasons herfore?
Thx
Clemens
-----Ursprüngliche Nachricht-----
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 24. April 2015 14:01
An: solr-user@lucene.apache.org
Betreff: o.a.s.c.SolrException: missing content stream

Context: Solr/Lucene 5.1
Adding documents to Solr core/index through SolrJ

I extract pdf's using tika. The pdf-content is one of the fields of my 
SolrDocuments that are transmitted to Solr using SolrJ.
As not all documents seem to be "coming through" I looked into the Solr-logs 
and see the follwoing exceptions:
org.apache.solr.common.SolrException: Exception writing document id 
fustusermanuals#4614 to the index; possible analysis error.
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:170)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
        at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
...
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Unknown Source) Caused by: 
java.lang.IllegalArgumentException: Document contains at least one immense term 
in field="content__s_i_suggest" (whose UTF8 encoding is longer than the max 
length 32766), all of which were skipped.  Please correct the analyzer to not 
produce such terms.  The prefix of the first immense term is: '[10, 32, 10, 32, 
10, 10, 70, 82, 32, 77, 111, 100, 101, 32, 100, 39, 101, 109, 112, 108, 111, 
105, 32, 10, 10, 32, 10, 10, 32, 10]...', original message: bytes can be at 
most 32766 in length; got 186493
        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)
        at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
        at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
        ... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: 
bytes can be at most 32766 in length; got 186493
        at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
        at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)
        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657)
        ... 47 more

How can I tell Solr/SolrJ to allow more payload?

I also see some
org.apache.solr.common.SolrException: Exception writing document id 
fustusermanuals#3323 to the index; possible analysis error.
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:170)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
        at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
        at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:697)
...
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Unknown Source) Caused by: 
java.lang.IllegalArgumentException: Document contains at least one immense term 
in field="content__s_i_suggest" (whose UTF8 encoding is longer than the max 
length 32766), all of which were skipped.  Please correct the analyzer to not 
produce such terms.  The prefix of the first immense term is: '[10, 69, 78, 32, 
76, 67, 68, 32, 116, 101, 108, 101, 118, 105, 115, 105, 111, 110, 10, 95, 95, 
95, 95, 95, 95, 95, 95, 95, 95, 95]...', original message: bytes can be at most 
32766 in length; got 164683
        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)
        at 
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
        at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
        at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
        at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
        at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242)
        at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
        ... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: 
bytes can be at most 32766 in length; got 164683
        at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
        at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)
        at 
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657)
        ... 47 more

Which seem result from the same "limitation"

Unfortunately I must extract the pdfs in the my client

Thx
Clemens

Reply via email to