Hi Walter and Jack, Many thanks for your feedback!
I have no idea why the developer is using such an old version, but hoping that your feedback and suggestions will give them a push in the right direction. Is it a huge undertaking to upgrade from v3.6 to v5.5?? (I surely hope not.) Thanks again, Sara On Apr 14, 2016, at 2:55 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > > Yes, this is the intended behavior. All of the Solr stemmers are based on > heuristics that are not perfect, and are not based on the real dictionary. > You can solve one problem by switching to another stemmer, but then you run > into a different problem, rinse and repeat. > > The code has a specific rule that refrains from stemming a pattern that > also happens to match your specified cases: > > if (s[len-3] == 'i' || s[len-3] == 'a' || s[len-3] == 'o' || > s[len-3] == 'e') > return len; > > See: > https://github.com/apache/lucene-solr/blob/branch_3x/lucene/contrib/analyzers/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java > > <https://github.com/apache/lucene-solr/blob/branch_3x/lucene/contrib/analyzers/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java> > > So, xxxies, xxxaes, xxxoes, and xxxees will all remain unstemmed. Exactly > what the rationale for that rule was is unspecified in the code - no > comments, other than to point to this research document: > https://www.researchgate.net/publication/220433848_How_effective_is_suffixing > <https://www.researchgate.net/publication/220433848_How_effective_is_suffixing> > > > -- Jack Krupansky > > >> On Apr 14, 2016, at 1:44 PM, Walter Underwood <wun...@wunderwood.org> wrote: >> >> Solr 3.6 is a VERY old release. You won’t see any fixes for that. >> >> I would recommend starting with Solr 5.5 and keeping an eye on Solr 6.x, >> which has just started releases. >> >> Removing -ing endings is pretty aggressive. That changes “tracking meeting” >> into “track meet”. Most of the time, you’ll be better off with an >> inflectional stemmer that just converts plurals to singulars and other >> similar changes. >> >> The Porter stemmer does not produce dictionary words. It produces “stems”. >> Those are the same for the singular and plural forms of a word, but the stem >> might not be a word. >> >> 1. Start using Solr 5.5. That automatically gets you four years of bug fixes >> and performance improvements. >> 2. Look at the options for language analysis in the current release of Solr: >> https://cwiki.apache.org/confluence/display/solr/Language+Analysis >> <https://cwiki.apache.org/confluence/display/solr/Language+Analysis> >> 3. Learn the analysis tool in the Solr admin UI. That allows you to explore >> the behavior. >> 4. If you really need a high grade morphological analyzer, consider >> purchasing one from Basis Technology: http://www.rosette.com/solr/ >> <http://www.rosette.com/solr/> >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Apr 14, 2016, at 10:17 AM, Sara Woodmansee <swood...@gmail.com> wrote: >>> >>> Hello all, >>> >>> I posted yesterday, however I never received my own post, so worried it did >>> not go through (?) Also, I am not a coder, so apologies if not appropriate >>> to post here. I honestly don't know where else to turn, and am determined >>> to find a solution, as search is essential to our site. >>> >>> We are having a website built with a search engine based on SOLR v3.6. For >>> stemming, the developer uses EnglishMinimalStemFilterFactory. They were >>> previously using PorterStemFilterFactory which worked better with plural >>> forms, however PorterStemFilterFactory was not working correctly with –ing >>> endings. “icing” becoming "ic", for example. >>> >>> Most search terms work fine, but we have inconsistent results (singular vs >>> plural) with terms that end in -ee, -oe, -ie, -ae, and words that end in >>> -s. In comparison, the following work fine: words that end with -oo, -ue, >>> -e, -a. >>> >>> The developers have been unable to find a solution ("Unfortunately we tried >>> to apply all the filters for stemming but this problem is not resolved"), >>> but this has to be a common issue (?) Someone surely has found a solution >>> to this problem?? >>> >>> Any suggestions greatly appreciated. >>> >>> Many thanks! >>> Sara >>> _____________________ >>> >>> DO NOT WORK: Plural terms that end in -ee, -oe, -ie, -ae, and words that >>> end in -s. >>> >>> Examples: >>> >>> tree = 0 results >>> trees = 21 results >>> >>> dungaree = 0 results >>> dungarees = 1 result >>> >>> shoe = 0 results >>> shoes = 1 result >>> >>> toe = 1 result >>> toes = 0 results >>> >>> tie = 1 result >>> ties = 0 results >>> >>> Cree = 0 results >>> Crees = 1 result >>> >>> dais = 1 result >>> daises = 0 results >>> >>> bias = 1 result >>> biases = 0 results >>> >>> dress = 1 result >>> dresses = 0 results >>> _____________________ >>> >>> WORKS: Words that end with -oo, -ue, -e, -a >>> >>> Examples: >>> >>> tide = 1 result >>> tides = 1 results >>> >>> hue = 2 results >>> hues = 2 results >>> >>> dakota = 1 result >>> dakotas = 1 result >>> >>> loo = 1 result >>> loos = 1 result >>> _____________________ >>> >>