Re: Error when indexing against a specific dynamic field type

Shawn Heisey Tue, 01 May 2018 09:18:08 -0700

On 5/1/2018 8:40 AM, THADC wrote:
> I get the following exception:
>
> *Exception writing document id FULL_36265 to the index; possible analysis
> error: Document contains at least one immense term in
> field="gridFacts_tsing" (whose UTF8 encoding is longer than the max length
> 32766), all of which were skipped.  Please correct the analyzer to not
> produce such terms.  The prefix of the first immense term is: '[108, 111,
> 114, 101, 109, 32, 105, 112, 115, 117, 109, 32, 100, 111, 108, 111, 114, 32,
> 115, 105, 116, 32, 97, 109, 101, 116, 44, 32, 99, 111]...', original
> message: bytes can be at most 32766 in length; got 68144.*
>
> Any ideas are greatly appreciated. Thank you.


The error is not ambiguous.  It tells you precisely what the problem
is.  A single term in a Lucene index cannot be longer than about 32K,
that one has a term that's more than twice that size.

I'm guessing that the fieldType named alphaOnlySort is one of two
things:  Either the StrField class, or the TextField class with the
keyword tokenizer factory.

To fix this problem you will need to either reduce the size of the input
on the field, or use an analysis chain that splits the input into
smaller tokens.  It appears that the input string is comma separated
numbers, which probably should be tokenized, not treated as a single term.

Thanks,
Shawn

Re: Error when indexing against a specific dynamic field type

Reply via email to