Re: Query on Synonyms feature in Solr

rajini maski Wed, 15 Jun 2011 09:38:06 -0700

ok. Thank you. I will consider this.

One last doubt ,how do i handle negation terms?


In the above mail as i mentioned, If i have 3 sentence like this:

1 .tissue devitalization was observed in hepalocytes of liver
2. necrosis was observed in liver
3. Necrosis not found in liver

When i search "Necrosis not found" I need to get only the last sentence. but
now i get all the 3 results.

I am not able to find out the list of tokens and analysers that i need to
apply in order to acheieve this desired output

Awaiting reply
Rajani Maski




As explained in the above mail,

On Wed, Jun 15, 2011 at 9:42 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Well, first it is usually unnecessary to specify the
> synonym filter both at index and query time, I'd apply
> it only at query time to start, then perhaps switch
> to index time, see the discussion at:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
> for why index-time is preferable.
> Note you'll have to re-index.
>
> That said, essentially what happens (and assuming
> synonym filter is only in the query part) is you have
> something like this as your search for "necrosis not
> found".
>
> Offset 0                         offset1         offset 2
> necrosis
> tissue devitalization        not            found
> cellular necrosis
>
>
> Note that one of your three synonyms must appear in position 0,
> followed by the other two terms.
>
> So your example should "just work". But as I said, it would probably
> be best if you put your synonym filter only in at index or query time.
>
> An analogous process happens if you add synonyms at index
> time.
>
> Best
> Erick
>
> On Wed, Jun 15, 2011 at 8:14 AM, rajini maski <rajinima...@gmail.com>
> wrote:
> > Erick: I have tried what you said. I needed clarification on this.. Below
> is
> > my doubt added:
> >
> > Say If i have field type :
> >
> > <fieldType name="Synonymdata" class="solr.TextField"
> > positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >          <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter"
> >
> synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt"
> > ignoreCase="true" expand="true"/>
> >      <filter class="solr.LowerCaseFilterFactory"/>
> >    <filter class="solr.SnowballPorterFilterFactory" language="English"
> > protected="protwords.txt"/>
> >      </analyzer>
> >      <analyzer type="query">
> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >          <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter"
> >
> synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt"
> > ignoreCase="true" expand="false"/>
> >        <filter class="solr.StopFilterFactory"  ignoreCase="true"
> > words="stopwords.txt"       enablePositionIncrements="true" />
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.SnowballPorterFilterFactory"
> language="English"
> > protected="protwords.txt"/>
> >      </analyzer>
> >    </fieldType>
> >
> >
> >
> > The data indexed in this field is :
> >
> > sentence 1 : " tissue devitalization was noted in hepalocytes of liver"
> > sentence 2 :  "Necrosis not found in liver"
> >
> > Synonyms:
> > necrosis , tissue devitalization, cellular necrosis
> >
> > How does the white space and synonym filter behave?I am not able to
> > understand in analysis page..Please let me know if  it is like this that
> > works? Correct me if i am wrong..
> >
> > sentence 1 : " tissue devitalization was noted in hepalocytes of liver"
> >
> > white space :
> > tissue
> >  devitalization
> >  was
> >  noted
> >  in
> >  hepalocytes
> >  of
> > liver
> >
> > Synoyms for token words:
> > No synonyms for tissue , no synonym for devitalization and so
> > on.........................
> > So does the "tissue devitalization" word will not become synonym for
> > Necrosis ?(since it is mentioned in synonym)
> >
> > If it adds as the synonym, Then how is it splitting the sentence and
> adding
> > the filter? Which is happening first?
> >
> >
> > Sentence 2: Necrosis not  found in liver
> >
> >
> > white space
> > Necrosis
> > not
> >  found
> >  in
> >  liver
> >
> >
> > Synoyms for token words:
> > synonyms for Necrosis: tissue devitalization,cellular necrosis, no
> synonym
> > for not, no synonym for found and so on.........................
> >
> > Is this correct?
> >
> >
> > My main concern is when i have 3 set of data like this:
> >
> > tissue devitalization was observed in hepalocytes of liver
> > necrosis was observed in liver
> > Necrosis not found in liver
> >
> > When i search "Necrosis not found" I need to get only the last sentence.
> >
> > I am not able to find out the list of tokens and analysers that i need to
> > apply in order to acheieve this desired output
> >
> > Awaiting reply
> > Rajani Maski
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Jun 14, 2011 at 3:13 PM, roySolr <royrutten1...@gmail.com>
> wrote:
> >
> >> Maybe you can try to escape the synonyms so it's no tokized by
> whitespace..
> >>
> >> Private\ schools,NGO\ Schools,Unaided\ schools
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/Query-on-Synonyms-feature-in-Solr-tp3058197p3062392.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >
>

Re: Query on Synonyms feature in Solr

Reply via email to