Re: Error when submitting PDF to Solr w/text fields using SolrJ

Paden Fri, 19 Jun 2015 06:32:55 -0700

Yeah I'm just gonna say hands down this was a totally bad question. My fault,
mea culpa. I'm pretty new to working in an IDE environment and using a stack
trace (I just finished my first year of CS at University and now I'm
interning). I'm actually kind of embarrassed by how long it took me to
realize I wasn't looking at the entire stack trace. Idiot moment of the week
for sure. Thanks for the patience guys but when I looked at the entire stack
trace it gave me this.


Caused by: java.lang.IllegalArgumentException: Document contains at least
one immense term in field="text" (whose UTF8 encoding is longer than the max
length 32766), all of which were skipped.  Please correct the analyzer to
not produce such terms.  The prefix of the first immense term is: '[84, 104,
101, 32, 73, 78, 76, 32, 105, 115, 32, 97, 32, 85, 46, 83, 46, 32, 68, 101,
112, 97, 114, 116, 109, 101, 110, 116, 32, 111]...', original message: bytes
can be at most 32766 in length; got 44360
        at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)
        at
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
        at
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
        at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
        at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)
        at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350)
        at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
        at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
        ... 40 more
Caused by:
org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes
can be at most 32766 in length; got 44360
        at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
        at
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)
        at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657)
        ... 47 more


And it took me all of two seconds to realize what had gone wrong. Now I'm
just trying to figure out how to index the text content without truncating
all the info or filtering it out entirely, thereby messing up my searching
capabilities. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212919.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Error when submitting PDF to Solr w/text fields using SolrJ

Reply via email to