Right, instead of this in synonyms file:

  responsibility, obligation, duty

 
I could stem each of the above words/synonyms and have something like this in 
synonyms file:

  respons, oblig, duti

But somehow this feels bad (well, so does sticking word variations in what's 
supposed to be a synonyms file), partly because it means that the person adding 
new synonyms would need to know what they stem to (or always check it against 
Solr before editing the file).

I've never seen anyone actually use such a synonyms file in production, have 
you?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Lance Norskog <goks...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, April 26, 2011 12:20:05 AM
> Subject: Re: Automatic synonyms for multiple variations of a word
> 
> This has come up with stemming: you can stem your synonym list with
> the  FieldAnalyzer Solr http call, then save the final chewed-up terms
> as a new  synonym file. You then use that one in the analyzer stack
> below the stemmer  filter.
> 
> On Mon, Apr 25, 2011 at 9:15 PM, Otis Gospodnetic
> <otis_gospodne...@yahoo.com>  wrote:
> > Hi Otis & Robert,
> >
> >  ----- Original Message  ----
> >
> >>
> >> How do people handle cases where synonyms  are used and there are  multiple
> >> version of the original word that  really need to point to the same  set of
> >>  synonyms?
> >>
> >> For example:
> >> Consider singular and  plural of the  word "responsibility".  One might 
have
> >> synonyms  defined like  this:
> >>
> >>   responsibility, obligation,  duty
> >>
> >> But the plural  "responsibilities" is not in there,  and thus it will not 
>get
> >> expanded to the  synonyms above! That's a  problem.
> >>
> >> Sure, one could change the synonyms  file to  look like this:
> >>
> >>   responsibility, responsibilities,   obligation, duty
> >>
> >> But that means somebody needs to think  of all variations  of the word!
> >
> > Yes, that seems to be the case  now, as it was in 2008:
> > 
>http://search-lucene.com/m/gLwUCV0qU02&subj=Re+Synonyms+and+stemming+revisited
> > http://search-lucene.com/m/7lqdp1ldrqx (Hoss replied, but I think  that
> > suggestion doesn't actually work)
> >
> >> Is there a  something one can do to get all variations of  the word to map 
> >>  
>to
> >>the
> >>
> >> same synonyms without having to  explicitly specify  all variations of the
> > word?
> >
> > I think  this is where Robert's 2+2lemma pointer may help because the 
2+lemma
> >  list contains "records" where a headword is followed by a list of other
> >  variations of the word.  The way I think this would help is by simply  
>taking
> > that list and turning it into the synonyms file format, and then  merging 
> > in 
>the
> > actual synonyms.
> >
> > For example, if I have  the word "responsibility", then from 2+2lemma I 
>should be
> > able to get  that "responsibilities" is one of the variants of 
>"responsibility".
> > I  should then be able to take those 2 words and stick them in synonyms 
> > file  
>like
> > this:
> >
> >  responsibility,  responsibilities
> >
> > And then append actual synonyms to  that:
> >
> >  responsibility, responsibilities, obligation,  duty
> >
> > But I may then need to actually expand synonyms themselves,  too (again 
using
> > data from 2+2lemma):
> >
> >  responsibility,  responsibilities, obligation, obligations, duty, duties
> >
> >
> >  I haven't tried this yet.  Just theorizing and hoping for  feedback.
> >
> > Does this sound about right?
> >
> >  Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
> >
> 
> 
> 
> -- 
> Lance  Norskog
> goks...@gmail.com
> 

Reply via email to