To handle irregular nouns ( http://www.ef.com/english-resources/english-grammar/singular-and-plural-nouns/), the simplest way is handle them using StemOverriderFactory. The list is not so long. Or otherwise go for commercial solutions like basistech etc. as Alex suggested oR you can customize Hunspell extensively to handle most of them.
Thanks, Susheel On Thu, Dec 15, 2016 at 9:46 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > If you need the full fidelity solution taking care of multiple > edge-cases, it could be worth looking at commercial solutions. > > > http://www.basistech.com/ has one, including a free-level SAAS plan. > > Regards, > Alex. > ---- > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 15 December 2016 at 21:28, Lasitha Wattaladeniya <watt...@gmail.com> > wrote: > > Hi all, > > > > Thanks for the replies, > > > > @eric, ahmet : since those stemmers are logical stemmers it won't work on > > words such as caught, ran and so on. So in our case it won't work > > > > @susheel : Yes I thought about it but problems we have is, the documents > we > > index are some what large text, so copy fielding these into duplicate > > fields will affect on the index time ( we have jobs to index data > > periodically) and query time. I wonder why there isn't a correct solution > > to this > > > > Regards, > > Lasitha > > > > Lasitha Wattaladeniya > > Software Engineer > > > > Mobile : +6593896893 > > Blog : techreadme.blogspot.com > > > > On Fri, Dec 16, 2016 at 12:58 AM, Susheel Kumar <susheel2...@gmail.com> > > wrote: > > > >> We did extensive comparison in the past for Snowball, KStem and Hunspell > >> and there are cases where one of them works better but not other or > >> vice-versa. You may utilise all three of them by having 3 different > fields > >> (fieldTypes) and during query, search in all of them. > >> > >> For some of the cases where none of them works (e.g wolves, wolf etc)., > use > >> StemOverriderFactory. > >> > >> HTH. > >> > >> Thanks, > >> Susheel > >> > >> On Thu, Dec 15, 2016 at 11:32 AM, Ahmet Arslan > <iori...@yahoo.com.invalid> > >> wrote: > >> > >> > Hi, > >> > > >> > KStemFilter returns legitimate English words, please use it. > >> > > >> > Ahmet > >> > > >> > > >> > > >> > On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya < > >> > watt...@gmail.com> wrote: > >> > Hello devs, > >> > > >> > I'm trying to develop this indexing and querying flow where it > converts > >> the > >> > words to its original form (lemmatization). I was doing bit of > research > >> > lately but the information on the internet is very limited. I tried > using > >> > hunspellfactory but it doesn't convert the word to it's original form, > >> > instead it gives suggestions for some words (hunspell works for some > >> > english words correctly but for some it gives multiple suggestions or > no > >> > suggestions, i used the en_us.dic provided by openoffice) > >> > > >> > I know this is a generic problem in searching, so is there anyone who > can > >> > point me to correct direction or some information :) > >> > > >> > Best regards, > >> > Lasitha Wattaladeniya > >> > Software Engineer > >> > > >> > Mobile : +6593896893 > >> > Blog : techreadme.blogspot.com > >> > > >> >