Hi Rick,

Yep, that's really weird, because I am using the StandardTokenizerFactory,
which is supposed to remove whitespace. Also tried the
WhitespaceTokenizerFactory. I'll have a look at other analyzers or if
nothing works maybe implement my own.

I am using a Shingle filter right after the StandardTokenizer, not sure if
that has anything to do with it.


Thanks,
Angel


On Tue, Jul 25, 2017 at 12:09 AM Rick Leir <rl...@leirtech.com> wrote:

> Angel,
> The 20 byte is an ASCII space character, which is a separator in most
> contexts. Breaking the buffer at spaces, you can see 6 non-space tokens.
>
> Have a look at your analysis chain and see why you are getting this.
> Cheers -- Rick
>
> On July 24, 2017 4:27:00 PM EDT, Angel Todorov <attodo...@gmail.com>
> wrote:
> >Hi guys,
> >
> >I am trying to setup the FreeTextSuggester/ Lookup Factory in a
> >suggester
> >definition in SOLR. Unfortunately while the index is building, I am
> >encountering the following errors:
> >
> >*"msg":"tokens must not contain separator byte; got token=[30 20 30 20
> >32
> >20 72 20 61 6c 6c 65 6e 20 72] but gramCount=6, which is greater than
> >expected max ngram size=5","trace":"java.lang.IllegalArgumentException:
> >tokens must not contain separator byte; got token=[30 20 30 20 32 20 72
> >20
> >61 6c 6c 65 6e 20 72] but gramCount=6, which is greater than expected
> >max
> >ngram size=5\r\n\tat
>
> >org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.build(FreeTextSuggester.java:362)\r\n\tat
> >*
> >
> >I've also opened the following issue, because i don't think it's right
> >not
> >to handle this exception:
> >
> >https://issues.apache.org/jira/browse/SOLR-11139
> >
> >But my question is about the error in general - why is it occurring? I
> >only
> >have English text, nothing special.
> >
> >Thanks,
> >Angel
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Reply via email to