Re: Destemming snafu

Stephen Weiss Thu, 18 Jun 2009 14:11:19 -0700

Yes, that's exactly what I needed. I don't know how I missed that.Thank you!


--
Steve


On Jun 18, 2009, at 4:49 PM, Brendan Grainger wrote:

Are you using Porter Stemming? If so I think you can just specifyyour word in the protwords.txt file (or whatever you've called it).
Check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the example config for thePorter Stemmer:
<fieldtype name="myfieldtype" class="solr.TextField">
<analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/><filter class="solr.EnglishPorterFilterFactory"protected="protwords.txt" /> </analyzer>
</fieldtype>

HTH
Brendan

On Jun 18, 2009, at 4:38 PM, Stephen Weiss wrote:
Hi,
I've hit a bit of a problem with destemming and could use someadvice.
Right now there is a word in the index called "Stylesight" andanother word "Stylesightings", which was just added. When userssearch for "Stylesightings", the client really only wants them toget results that match "Stylesightings" and not "Stylesight", asthey are two [relatively] unrelated things. However, I'm guessingbecause of the destemmer, "Stylesightings" becomes "Stylesight"internally... which results in the "wrong" behavior.
I really don't want to turn off the destemmer, that's like killingan ant with a nuke. I was thinking, perhaps, since we use bothindex- and query-time synonyms, I could make a synonym like this:
"Stylesightings" =>  "xlkje0r923jjfsdf"
or some other random string of un-destemmable junk, that mightwork, but I'm not sure and reindexing all the affected documentswill take quite some time so it would be good to know in advance ifthis is even a good idea.
Of course, if there's another, better idea, I'd be very open tothat too.
Thanks for any suggestions!

--
Steve

Re: Destemming snafu

Reply via email to