Steve:

You _really_ want to get acquainted with the admin UI/Analysis page ;). Choose 
a core/collection and you should see the choice. It shows you exactly what 
transformations your data goes through. If you hover over the light gray pairs 
of letters, you’ll get a tooltip showing you what part of your analysis chain 
is responsible for a particular change. I un-check the “verbose” box 95% of the 
time BTW.

The critical bit is that what comes out of the end of the analysis pipe are the 
tokens that are actually _in_ the index. From there, problems like this make 
more sense.

My bet is that, as Walter says, you have a stemmer in the analysis chain and 
the actual token in the index is “kinas” so of course “kinase*” won’t be found. 
By adding OR kinase to the query, that token is stemmed to “kinas” and matches.

Also, adding &debug=query to your URL will show you what the query looks like 
after parsing and analysis, also a major tool for figuring out what’s really 
happening.

Wildcards are not stemmed, which can lead to surprising results. There’s no 
perfect answer here. Let’s claim wildcards _were_ stemmed. Then you’d have to 
try to explain why “running*” returned a doc with only “run” or “runner” or 
“runs” or... in it, but searching for “runnin*” did not due the stemmer not 
recognizing it as a stemmable word.

Finally, one of my personal hot buttons is wildcards in general. They’re very 
often over-used because people are used to simple search capabilities. 
Something about “if your only tool is a hammer, every problem looks like a 
nail”. That gets into training users too though...

Best,
Erick

> On Feb 11, 2020, at 9:24 PM, Fischer, Stephen 
> <sfisc...@pennmedicine.upenn.edu> wrote:
> 
> Hi,
> 
> I am a solr newbie.  I was surprised to discover that a search for kinase* 
> returned fewer results than kinase.
> 
> Then I read the wildcard 
> documentation<https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#TheStandardQueryParser-WildcardSearches>,
>  and saw why.  kinase* will not match the word "kinase".
> 
> Our end-users won't expect this behavior.  Presumably the solution would be 
> for them (actually us, on their behalf), to use kinase* OR kinase.
> 
> But that is kind of a hack.
> 
> Is there a way we can configure solr to have wildcards match on end-of-word?
> 
> Thanks,
> Steve

Reply via email to