Hi Edwin, Hunspell is configurable, language independent library and you can define any morphology rules. It’s beed there for a while and I would not be surprised if someone already adjusted english rules to suite you case.
Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 3 Nov 2017, at 04:25, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > > Hi Emir, > > We are looking to change to HunspellStemFilterFactory. This has a > dictionary file containing words and applicable flags, and an affix file > that specifies how these flags will control spell checking. > Probably we can control it from those files in HunspellStemFilterFactory? > > Regards, > Edwin > > > On 2 November 2017 at 17:46, Emir Arnautović <emir.arnauto...@sematext.com> > wrote: > >> Hi Edwin, >> It seems that it would be best if you do not apply *ing stemming rule at >> all. The first idea is to trick stemmer and replace any word that ends with >> ing to some nonexisting char combination e.g. ‘wqx’. You can use >> solr.PatternReplaceFilterFactory >> to do that. You can switch it back after stemming if want to have proper >> token in index. >> >> HTH, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo <edwinye...@gmail.com> >> wrote: >>> >>> Hi Emir, >>> >>> We do have quite alot of words that should not be stemmed. Currently, the >>> KStemFilterFactory are stemming all the non-English words that end with >>> "ing" as well. There are quite alot of places and names which ends in >>> "ing", and all these are being stemmed as well, which leads to an >>> inaccurate search. >>> >>> Regards, >>> Edwin >>> >>> >>> On 1 November 2017 at 18:20, Emir Arnautović < >> emir.arnauto...@sematext.com> >>> wrote: >>> >>>> Hi Edwin, >>>> If the number of words that should not be stemmed is not high you could >>>> use KeywordMarkerFilterFactory to flag those words as keywords and it >>>> should prevent stemmer from changing them. >>>> Depending on what you want to achieve, you might not be able to avoid >>>> using stemmer at indexing time. If you want to find documents that >> contain >>>> only “walking” with search term “walk”, then you have to stem at index >>>> time. Cases when you use stemming on query time only are rare and >> specific. >>>> If you want to prefer exact matches over stemmed matches, you have to >>>> index same content with and without stemming and boost matches on field >>>> without stemming. >>>> >>>> HTH, >>>> Emir >>>> -- >>>> Monitoring - Log Management - Alerting - Anomaly Detection >>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >>>> >>>> >>>> >>>>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo <edwinye...@gmail.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> We are currently using KStemFilterFactory in Solr, but we found that it >>>> is >>>>> actually doing stemming on non-English words like "ximenting", which it >>>>> stem to "ximent". This is not what we wanted. >>>>> >>>>> Another option is to use the HunspellStemFilterFactory, but there are >>>> some >>>>> English words like "running", walking" that are not being stemmed. >>>>> >>>>> Would like to check, is it advisable to use Stemming at index? Or we >>>> should >>>>> not use Stemming at index time, but at query time, do a search for the >>>>> stemmed words as well, like for example, if the user search for >>>> "walking", >>>>> we will do the search together with "walk", and the actual word of >>>> walking >>>>> will have higher weightage. >>>>> >>>>> I'm currently using Solr 6.5.1. >>>>> >>>>> Regards, >>>>> Edwin >>>> >>>> >> >>