Hi Rick, Yep, that's really weird, because I am using the StandardTokenizerFactory, which is supposed to remove whitespace. Also tried the WhitespaceTokenizerFactory. I'll have a look at other analyzers or if nothing works maybe implement my own.
I am using a Shingle filter right after the StandardTokenizer, not sure if that has anything to do with it. Thanks, Angel On Tue, Jul 25, 2017 at 12:09 AM Rick Leir <rl...@leirtech.com> wrote: > Angel, > The 20 byte is an ASCII space character, which is a separator in most > contexts. Breaking the buffer at spaces, you can see 6 non-space tokens. > > Have a look at your analysis chain and see why you are getting this. > Cheers -- Rick > > On July 24, 2017 4:27:00 PM EDT, Angel Todorov <attodo...@gmail.com> > wrote: > >Hi guys, > > > >I am trying to setup the FreeTextSuggester/ Lookup Factory in a > >suggester > >definition in SOLR. Unfortunately while the index is building, I am > >encountering the following errors: > > > >*"msg":"tokens must not contain separator byte; got token=[30 20 30 20 > >32 > >20 72 20 61 6c 6c 65 6e 20 72] but gramCount=6, which is greater than > >expected max ngram size=5","trace":"java.lang.IllegalArgumentException: > >tokens must not contain separator byte; got token=[30 20 30 20 32 20 72 > >20 > >61 6c 6c 65 6e 20 72] but gramCount=6, which is greater than expected > >max > >ngram size=5\r\n\tat > > >org.apache.lucene.search.suggest.analyzing.FreeTextSuggester.build(FreeTextSuggester.java:362)\r\n\tat > >* > > > >I've also opened the following issue, because i don't think it's right > >not > >to handle this exception: > > > >https://issues.apache.org/jira/browse/SOLR-11139 > > > >But my question is about the error in general - why is it occurring? I > >only > >have English text, nothing special. > > > >Thanks, > >Angel > > -- > Sorry for being brief. Alternate email is rickleir at yahoo dot com