Re: Error when submitting PDF to Solr w/text fields using SolrJ

Alessandro Benedetti Fri, 19 Jun 2015 07:01:36 -0700

Silly thing … Maybe the immense token was generating because trying to set
"string" as field type for your text ?
Can be ?
Can you wipe out the index, set a proper type for your text, and index
again ?
No worries about the not full stack trace,
We learn and do wrong things everyday :)
Errare humanum est


Cheers

2015-06-19 14:31 GMT+01:00 Paden <rumsey...@gmail.com>:

> Yeah I'm just gonna say hands down this was a totally bad question. My
> fault,
> mea culpa. I'm pretty new to working in an IDE environment and using a
> stack
> trace (I just finished my first year of CS at University and now I'm
> interning). I'm actually kind of embarrassed by how long it took me to
> realize I wasn't looking at the entire stack trace. Idiot moment of the
> week
> for sure. Thanks for the patience guys but when I looked at the entire
> stack
> trace it gave me this.
>
> Caused by: java.lang.IllegalArgumentException: Document contains at least
> one immense term in field="text" (whose UTF8 encoding is longer than the
> max
> length 32766), all of which were skipped.  Please correct the analyzer to
> not produce such terms.  The prefix of the first immense term is: '[84,
> 104,
> 101, 32, 73, 78, 76, 32, 105, 115, 32, 97, 32, 85, 46, 83, 46, 32, 68, 101,
> 112, 97, 114, 116, 109, 101, 110, 116, 32, 111]...', original message:
> bytes
> can be at most 32766 in length; got 44360
>         at
>
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)
>         at
>
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
>         at
>
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
>         at
>
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
>         at
>
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)
>         at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350)
>         at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
>         at
>
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
>         ... 40 more
> Caused by:
> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes
> can be at most 32766 in length; got 44360
>         at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
>         at
> org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:154)
>         at
>
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:657)
>         ... 47 more
>
>
> And it took me all of two seconds to realize what had gone wrong. Now I'm
> just trying to figure out how to index the text content without truncating
> all the info or filtering it out entirely, thereby messing up my searching
> capabilities.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Error-when-submitting-PDF-to-Solr-w-text-fields-using-SolrJ-tp4212704p4212919.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Error when submitting PDF to Solr w/text fields using SolrJ

Reply via email to