There are some new features in 3.1 to make it easier to tune this stuff, especially:
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/src/java/org/apache/solr/analysis/StemmerOverrideFilterFactory.java This takes a tab separate list of words->stems, and sets a flag to any downstream stemmer to not mess with any of your mappings (thus the name: StemmerOverrideFilter). So the idea is you pick a stemmer thats close to what you want, then you put this filter before it to tune it to your needs. On Wed, Mar 30, 2011 at 12:05 PM, Robert Petersen <rober...@buy.com> wrote: > Thanks for the input! We've discussed using synonyms to help here. We > have product managers who are supposed to add keywords on to skus also > which our indexer will automatically consume. Getting them to do that > is a different matter! haha > > -----Original Message----- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Tuesday, March 29, 2011 11:19 AM > To: solr-user@lucene.apache.org > Subject: Re: FW: no results searching for stadium seating chairs > > It seems unlikely you are going to find something that stems everything > exactly how you want it, and nothing how you don't want it. This is very > > domain dependent, as you've discovered. I doubt there's even such a > thing as the way everyone doing a 'retail product title search' would > want it, it's going to vary. > > You could use the synonym feature to make your own stemming dictionary, > tell it to stem "seating" to "seat". > > Of course, that's also very "expensive" in terms of your time, to create > > your own custom dictionary. But you're going to have to live with one > of the compromises, software cant' do magic! > > For particular titles, you could also, in your own metadata control, add > > "alternate titles" that you want it to match on, before it even gets > indexed. > > On 3/29/2011 1:43 PM, Robert Petersen wrote: >> For retail product title search, would there be a better stemmer to > use? We wanted a less aggressive stemmer, but I would expect the term > seating to stem. I have found several other words which end in ing and > do not get stemmed. Amongst our product lines are four million books > with all kinds of crazy titles, like the following oddity! Here > counseling stems and unknowing doesn't: >> >> 1. The Cloud of Unknowing and the Book of Privy Counseling >> Buy New: $29.95 $18.30 >> 3 New and Used from $18.30 >> >> >> -----Original Message----- >> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley >> Sent: Tuesday, March 29, 2011 10:27 AM >> To: solr-user@lucene.apache.org >> Cc: Robert Petersen >> Subject: Re: FW: no results searching for stadium seating chairs >> >> On Tue, Mar 29, 2011 at 1:17 PM, Robert Petersen<rober...@buy.com> > wrote: >>> Very interestingly, LucidKStemFilterFactory is stemming 'ing's > differently for different words. The word 'seating' doesn't lose the > 'ing' but the word 'counseling' does! Can anyone explain the difference > here? protwords.txt is empty btw. >> KStem is dictionary driven, so "seating" is probably in the >> dictionary. I guess the author decided that "seating" and "seat" were >> sufficiently different. >> >> >> -Yonik >> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May >> 25-26, San Francisco >> >