There are some new features in 3.1 to make it easier to tune this
stuff, especially:

http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/src/java/org/apache/solr/analysis/StemmerOverrideFilterFactory.java

This takes a tab separate list of words->stems, and sets a flag to any
downstream stemmer to not mess with any of your mappings (thus the
name: StemmerOverrideFilter).

So the idea is you pick a stemmer thats close to what you want, then
you put this filter before it to tune it to your needs.


On Wed, Mar 30, 2011 at 12:05 PM, Robert Petersen <rober...@buy.com> wrote:
> Thanks for the input!  We've discussed using synonyms to help here.  We
> have product managers who are supposed to add keywords on to skus also
> which our indexer will automatically consume.  Getting them to do that
> is a different matter!  haha
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Tuesday, March 29, 2011 11:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: FW: no results searching for stadium seating chairs
>
> It seems unlikely you are going to find something that stems everything
> exactly how you want it, and nothing how you don't want it. This is very
>
> domain dependent, as you've discovered. I doubt there's even such a
> thing as the way everyone doing a 'retail product title search' would
> want it, it's going to vary.
>
> You could use the synonym feature to make your own stemming dictionary,
> tell it to stem "seating" to "seat".
>
> Of course, that's also very "expensive" in terms of your time, to create
>
> your own custom dictionary.  But you're going to have to live with one
> of the compromises, software cant' do magic!
>
> For particular titles, you could also, in your own metadata control, add
>
> "alternate titles" that you want it to match on, before it even gets
> indexed.
>
> On 3/29/2011 1:43 PM, Robert Petersen wrote:
>> For retail product title search, would there be a better stemmer to
> use?  We wanted a less aggressive stemmer, but I would expect the term
> seating to stem.  I have found several other words which end in ing and
> do not get stemmed.  Amongst our product lines are four million books
> with all kinds of crazy titles, like the following oddity!  Here
> counseling stems and unknowing doesn't:
>>
>> 1. The Cloud of Unknowing and the Book of Privy Counseling
>> Buy New: $29.95 $18.30
>> 3 New and Used from $18.30
>>
>>
>> -----Original Message-----
>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
>> Sent: Tuesday, March 29, 2011 10:27 AM
>> To: solr-user@lucene.apache.org
>> Cc: Robert Petersen
>> Subject: Re: FW: no results searching for stadium seating chairs
>>
>> On Tue, Mar 29, 2011 at 1:17 PM, Robert Petersen<rober...@buy.com>
> wrote:
>>> Very interestingly, LucidKStemFilterFactory is stemming 'ing's
> differently for different words.  The word 'seating' doesn't lose the
> 'ing' but the word 'counseling' does!  Can anyone explain the difference
> here?  protwords.txt is empty btw.
>> KStem is dictionary driven, so "seating" is probably in the
>> dictionary.  I guess the author decided that "seating" and "seat" were
>> sufficiently different.
>>
>>
>> -Yonik
>> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
>> 25-26, San Francisco
>>
>

Reply via email to