Hi Walter and Jack,

Many thanks for your feedback!

I have no idea why the developer is using such an old version, but hoping that 
your feedback and suggestions will give them a push in the right direction.

Is it a huge undertaking to upgrade from v3.6 to v5.5?? (I surely hope not.)

Thanks again,
Sara


On Apr 14, 2016, at 2:55 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote:
> 
> Yes, this is the intended behavior. All of the Solr stemmers are based on
> heuristics that are not perfect, and are not based on the real dictionary.
> You can solve one problem by switching to another stemmer, but then you run
> into a different problem, rinse and repeat.
> 
> The code has a specific rule that refrains from stemming a pattern that
> also happens to match your specified cases:
> 
>        if (s[len-3] == 'i' || s[len-3] == 'a' || s[len-3] == 'o' ||
> s[len-3] == 'e')
>          return len;
> 
> See:
> https://github.com/apache/lucene-solr/blob/branch_3x/lucene/contrib/analyzers/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java
>  
> <https://github.com/apache/lucene-solr/blob/branch_3x/lucene/contrib/analyzers/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java>
> 
> So, xxxies, xxxaes, xxxoes, and xxxees will all remain unstemmed. Exactly
> what the rationale for that rule was is unspecified in the code - no
> comments, other than to point to this research document:
> https://www.researchgate.net/publication/220433848_How_effective_is_suffixing 
> <https://www.researchgate.net/publication/220433848_How_effective_is_suffixing>
> 
> 
> -- Jack Krupansky

> 
> 
>> On Apr 14, 2016, at 1:44 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>> 
>> Solr 3.6 is a VERY old release. You won’t see any fixes for that.
>> 
>> I would recommend starting with Solr 5.5 and keeping an eye on Solr 6.x, 
>> which has just started releases.
>> 
>> Removing -ing endings is pretty aggressive. That changes “tracking meeting” 
>> into “track meet”. Most of the time, you’ll be better off with an 
>> inflectional stemmer that just converts plurals to singulars and other 
>> similar changes.
>> 
>> The Porter stemmer does not produce dictionary words. It produces “stems”. 
>> Those are the same for the singular and plural forms of a word, but the stem 
>> might not be a word.
>> 
>> 1. Start using Solr 5.5. That automatically gets you four years of bug fixes 
>> and performance improvements.
>> 2. Look at the options for language analysis in the current release of Solr: 
>> https://cwiki.apache.org/confluence/display/solr/Language+Analysis 
>> <https://cwiki.apache.org/confluence/display/solr/Language+Analysis>
>> 3. Learn the analysis tool in the Solr admin UI. That allows you to explore 
>> the behavior.
>> 4. If you really need a high grade morphological analyzer, consider 
>> purchasing one from Basis Technology: http://www.rosette.com/solr/ 
>> <http://www.rosette.com/solr/>
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Apr 14, 2016, at 10:17 AM, Sara Woodmansee <swood...@gmail.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> I posted yesterday, however I never received my own post, so worried it did 
>>> not go through (?) Also, I am not a coder, so apologies if not appropriate 
>>> to post here. I honestly don't know where else to turn, and am determined 
>>> to find a solution, as search is essential to our site.
>>> 
>>> We are having a website built with a search engine based on SOLR v3.6. For 
>>> stemming, the developer uses EnglishMinimalStemFilterFactory. They were 
>>> previously using PorterStemFilterFactory which worked better with plural 
>>> forms, however PorterStemFilterFactory was not working correctly with –ing 
>>> endings. “icing” becoming "ic", for example.
>>> 
>>> Most search terms work fine, but we have inconsistent results (singular vs 
>>> plural) with terms that end in -ee, -oe, -ie, -ae,  and words that end in 
>>> -s.  In comparison, the following work fine: words that end with -oo, -ue, 
>>> -e, -a.
>>> 
>>> The developers have been unable to find a solution ("Unfortunately we tried 
>>> to apply all the filters for stemming but this problem is not resolved"), 
>>> but this has to be a common issue (?) Someone surely has found a solution 
>>> to this problem?? 
>>> 
>>> Any suggestions greatly appreciated.
>>> 
>>> Many thanks!
>>> Sara 
>>> _____________________
>>> 
>>> DO NOT WORK:  Plural terms that end in -ee, -oe, -ie, -ae,  and words that 
>>> end in -s.  
>>> 
>>> Examples: 
>>> 
>>> tree = 0 results
>>> trees = 21 results
>>> 
>>> dungaree = 0 results
>>> dungarees = 1 result
>>> 
>>> shoe = 0 results
>>> shoes = 1 result
>>> 
>>> toe = 1 result
>>> toes = 0 results
>>> 
>>> tie = 1 result
>>> ties = 0 results
>>> 
>>> Cree = 0 results
>>> Crees = 1 result
>>> 
>>> dais = 1 result
>>> daises = 0 results
>>> 
>>> bias = 1 result
>>> biases = 0 results
>>> 
>>> dress = 1 result
>>> dresses = 0 results
>>> _____________________
>>> 
>>> WORKS:  Words that end with -oo, -ue, -e, -a
>>> 
>>> Examples: 
>>> 
>>> tide = 1 result
>>> tides = 1 results
>>> 
>>> hue = 2 results
>>> hues = 2 results
>>> 
>>> dakota = 1 result
>>> dakotas = 1 result
>>> 
>>> loo = 1 result
>>> loos = 1 result
>>> _____________________
>>> 
>> 

Reply via email to