Re: Advice on Stemming in Solr

2017-11-04 Thread Zheng Lin Edwin Yeo
Hi Emir, We are looking at the configuration, to try to adjust the rules to suit our use case. Regards, Edwin On 3 November 2017 at 16:24, Emir Arnautović wrote: > Hi Edwin, > Hunspell is configurable, language independent library and you can define > any morphology rules. It’s beed there for

Re: Advice on Stemming in Solr

2017-11-03 Thread Emir Arnautović
Hi Edwin, Hunspell is configurable, language independent library and you can define any morphology rules. It’s beed there for a while and I would not be surprised if someone already adjusted english rules to suite you case. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detecti

Re: Advice on Stemming in Solr

2017-11-02 Thread Zheng Lin Edwin Yeo
Hi Emir, We are looking to change to HunspellStemFilterFactory. This has a dictionary file containing words and applicable flags, and an affix file that specifies how these flags will control spell checking. Probably we can control it from those files in HunspellStemFilterFactory? Regards, Edwin

Re: Advice on Stemming in Solr

2017-11-02 Thread Emir Arnautović
Hi Edwin, It seems that it would be best if you do not apply *ing stemming rule at all. The first idea is to trick stemmer and replace any word that ends with ing to some nonexisting char combination e.g. ‘wqx’. You can use solr.PatternReplaceFilterFactory to do that. You can switch it back afte

Re: Advice on Stemming in Solr

2017-11-01 Thread Zheng Lin Edwin Yeo
Hi Emir, We do have quite alot of words that should not be stemmed. Currently, the KStemFilterFactory are stemming all the non-English words that end with "ing" as well. There are quite alot of places and names which ends in "ing", and all these are being stemmed as well, which leads to an inaccur

Re: Advice on Stemming in Solr

2017-11-01 Thread Emir Arnautović
Hi Edwin, If the number of words that should not be stemmed is not high you could use KeywordMarkerFilterFactory to flag those words as keywords and it should prevent stemmer from changing them. Depending on what you want to achieve, you might not be able to avoid using stemmer at indexing time.