Yes, and you might choose to use different options for different
fields. For dictionary searches, where users are searching for specific
words, and a high degree of precision is called for, stemming is less
helpful, but for full text searches, more so.
-Mike
On 4/23/2012 3:35 PM, Walter Underwood wrote:
There is a third approach. Create two fields and always query both of them,
with the exact field given a higher weight. This works great and performs well.
It is what we did at Netflix and what I'm doing at Chegg.
wunder
On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
So I just realized the other day that stemming basically happens at index
time. If I'm understanding correctly, there's no way to allow a user to
specify, at run time, whether to stem particular words or not based on a
single index. I think there are two options, but I'd love to hear that I'm
wrong:
1.) Incrementally build up a white list of words that don't stem very well.
To pick a random example out of the blue, "light" isn't super closely
related to, "lighter", so I might choose not to stem that. If I wanted to
do this, I think (if I understand correctly), stemmerOverrideFilter would
help me out with this. I'm not a big fan of this approach.
2.) Index all the text in two fields, once with stemming and once without.
Then build some kind of option into the UI for specifying whether to stem
the words or not, and search the appropriate field. Unfortunately, this
would roughly double the size of my index, and probably affect query times
too. Plus, the UI would probably suck.
Am I missing an option? Has anyone tried one of these approaches?
Thanks!
Andrew