Nope. The underscore is preserved right after tokenization even before it reaches any filters. You can choose the type "text_general" and try an index time analysis through the "Analysis" page on Solr Admin UI.
Thanks, Rahul On Sat, Jan 9, 2021 at 8:22 AM xiefengchang <fengchang_fi...@163.com> wrote: > did you configured PatternReplaceFilterFactory? > > > > > > > > > > > > > > > > > > At 2021-01-08 12:16:06, "Rahul Goswami" <rahul196...@gmail.com> wrote: > >Hello, > >So recently I was debugging a problem on Solr 7.7.2 where the query wasn't > >returning the desired results. Turned out that the indexed terms had > >underscore separated terms, but the query didn't. I was under the > >impression that terms separated by underscore are also tokenized by > >StandardTokenizerFactory, but turns out that's not the case. Eg: > >'hello-world' would be tokenized into 'hello' and 'world', but > >'hello_world' is treated as a single token. > >Is this a bug or a designed behavior? > > > >If this is by design, it would be helpful if this behavior is included in > >the documentation since it is similar to the behavior with periods. > > > > > https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer > >"Periods (dots) that are not followed by whitespace are kept as part of > the > >token, including Internet domain names. " > > > >Thanks, > >Rahul >