Right, instead of this in synonyms file: responsibility, obligation, duty
I could stem each of the above words/synonyms and have something like this in synonyms file: respons, oblig, duti But somehow this feels bad (well, so does sticking word variations in what's supposed to be a synonyms file), partly because it means that the person adding new synonyms would need to know what they stem to (or always check it against Solr before editing the file). I've never seen anyone actually use such a synonyms file in production, have you? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Lance Norskog <goks...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Tue, April 26, 2011 12:20:05 AM > Subject: Re: Automatic synonyms for multiple variations of a word > > This has come up with stemming: you can stem your synonym list with > the FieldAnalyzer Solr http call, then save the final chewed-up terms > as a new synonym file. You then use that one in the analyzer stack > below the stemmer filter. > > On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic > <otis_gospodne...@yahoo.com> wrote: > > Hi Otis & Robert, > > > > ----- Original Message ---- > > > >> > >> How do people handle cases where synonyms are used and there are multiple > >> version of the original word that really need to point to the same set of > >> synonyms? > >> > >> For example: > >> Consider singular and plural of the word "responsibility". One might have > >> synonyms defined like this: > >> > >> responsibility, obligation, duty > >> > >> But the plural "responsibilities" is not in there, and thus it will not >get > >> expanded to the synonyms above! That's a problem. > >> > >> Sure, one could change the synonyms file to look like this: > >> > >> responsibility, responsibilities, obligation, duty > >> > >> But that means somebody needs to think of all variations of the word! > > > > Yes, that seems to be the case now, as it was in 2008: > > >http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited > > http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think that > > suggestion doesn't actually work) > > > >> Is there a something one can do to get all variations of the word to map > >> >to > >>the > >> > >> same synonyms without having to explicitly specify all variations of the > > word? > > > > I think this is where Robert's 2+2lemma pointer may help because the 2+lemma > > list contains "records" where a headword is followed by a list of other > > variations of the word. The way I think this would help is by simply >taking > > that list and turning it into the synonyms file format, and then merging > > in >the > > actual synonyms. > > > > For example, if I have the word "responsibility", then from 2+2lemma I >should be > > able to get that "responsibilities" is one of the variants of >"responsibility". > > I should then be able to take those 2 words and stick them in synonyms > > file >like > > this: > > > > responsibility, responsibilities > > > > And then append actual synonyms to that: > > > > responsibility, responsibilities, obligation, duty > > > > But I may then need to actually expand synonyms themselves, too (again using > > data from 2+2lemma): > > > > responsibility, responsibilities, obligation, obligations, duty, duties > > > > > > I haven't tried this yet. Just theorizing and hoping for feedback. > > > > Does this sound about right? > > > > Thanks, > > Otis > > ---- > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > > Lucene ecosystem search :: http://search-lucene.com/ > > > > > > > > -- > Lance Norskog > goks...@gmail.com >