Re: StandardTokenizerFactory doesn't split on underscore

2021-01-09 Thread Rahul Goswami
Ah ok! Thanks Adam and Xiefeng On Sat, Jan 9, 2021 at 6:02 PM Adam Walz wrote: > It is expected that the StandardTokenizer will not break on underscores. > The StandardTokenizer follows the Unicode UAX 29 > standard which > specifies an undersc

Re: StandardTokenizerFactory doesn't split on underscore

2021-01-09 Thread Adam Walz
It is expected that the StandardTokenizer will not break on underscores. The StandardTokenizer follows the Unicode UAX 29 standard which specifies an underscore as an "extender" and this rule says to not b

Re: StandardTokenizerFactory doesn't split on underscore

2021-01-09 Thread Rahul Goswami
Nope. The underscore is preserved right after tokenization even before it reaches any filters. You can choose the type "text_general" and try an index time analysis through the "Analysis" page on Solr Admin UI. Thanks, Rahul On Sat, Jan 9, 2021 at 8:22 AM xiefengchang wrote: > did you configure

StandardTokenizerFactory doesn't split on underscore

2021-01-07 Thread Rahul Goswami
Hello, So recently I was debugging a problem on Solr 7.7.2 where the query wasn't returning the desired results. Turned out that the indexed terms had underscore separated terms, but the query didn't. I was under the impression that terms separated by underscore are also tokenized by StandardTokeni