On 5/1/2018 8:40 AM, THADC wrote: > I get the following exception: > > *Exception writing document id FULL_36265 to the index; possible analysis > error: Document contains at least one immense term in > field="gridFacts_tsing" (whose UTF8 encoding is longer than the max length > 32766), all of which were skipped. Please correct the analyzer to not > produce such terms. The prefix of the first immense term is: '[108, 111, > 114, 101, 109, 32, 105, 112, 115, 117, 109, 32, 100, 111, 108, 111, 114, 32, > 115, 105, 116, 32, 97, 109, 101, 116, 44, 32, 99, 111]...', original > message: bytes can be at most 32766 in length; got 68144.* > > Any ideas are greatly appreciated. Thank you.
The error is not ambiguous. It tells you precisely what the problem is. A single term in a Lucene index cannot be longer than about 32K, that one has a term that's more than twice that size. I'm guessing that the fieldType named alphaOnlySort is one of two things: Either the StrField class, or the TextField class with the keyword tokenizer factory. To fix this problem you will need to either reduce the size of the input on the field, or use an analysis chain that splits the input into smaller tokens. It appears that the input string is comma separated numbers, which probably should be tokenized, not treated as a single term. Thanks, Shawn