Re: Query on Synonyms feature in Solr

Erick Erickson Wed, 15 Jun 2011 10:07:38 -0700

Have you tried setting your default operator to AND
in schema.xml?

Best
Erick


On Wed, Jun 15, 2011 at 12:36 PM, rajini maski <rajinima...@gmail.com> wrote:
> ok. Thank you. I will consider this.
>
> One last doubt ,how do i handle negation terms?
>
> In the above mail as i mentioned, If i have 3 sentence like this:
>
> 1 .tissue devitalization was observed in hepalocytes of liver
> 2. necrosis was observed in liver
> 3. Necrosis not found in liver
>
> When i search "Necrosis not found" I need to get only the last sentence. but
> now i get all the 3 results.
>
> I am not able to find out the list of tokens and analysers that i need to
> apply in order to acheieve this desired output
>
> Awaiting reply
> Rajani Maski
>
>
>
>
> As explained in the above mail,
>
> On Wed, Jun 15, 2011 at 9:42 PM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> Well, first it is usually unnecessary to specify the
>> synonym filter both at index and query time, I'd apply
>> it only at query time to start, then perhaps switch
>> to index time, see the discussion at:
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
>> for why index-time is preferable.
>> Note you'll have to re-index.
>>
>> That said, essentially what happens (and assuming
>> synonym filter is only in the query part) is you have
>> something like this as your search for "necrosis not
>> found".
>>
>> Offset 0                         offset1         offset 2
>> necrosis
>> tissue devitalization        not            found
>> cellular necrosis
>>
>>
>> Note that one of your three synonyms must appear in position 0,
>> followed by the other two terms.
>>
>> So your example should "just work". But as I said, it would probably
>> be best if you put your synonym filter only in at index or query time.
>>
>> An analogous process happens if you add synonyms at index
>> time.
>>
>> Best
>> Erick
>>
>> On Wed, Jun 15, 2011 at 8:14 AM, rajini maski <rajinima...@gmail.com>
>> wrote:
>> > Erick: I have tried what you said. I needed clarification on this.. Below
>> is
>> > my doubt added:
>> >
>> > Say If i have field type :
>> >
>> > <fieldType name="Synonymdata" class="solr.TextField"
>> > positionIncrementGap="100">
>> >      <analyzer type="index">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >          <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter"
>> >
>> synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt"
>> > ignoreCase="true" expand="true"/>
>> >      <filter class="solr.LowerCaseFilterFactory"/>
>> >    <filter class="solr.SnowballPorterFilterFactory" language="English"
>> > protected="protwords.txt"/>
>> >      </analyzer>
>> >      <analyzer type="query">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >          <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter"
>> >
>> synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt"
>> > ignoreCase="true" expand="false"/>
>> >        <filter class="solr.StopFilterFactory"  ignoreCase="true"
>> > words="stopwords.txt"       enablePositionIncrements="true" />
>> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >        <filter class="solr.SnowballPorterFilterFactory"
>> language="English"
>> > protected="protwords.txt"/>
>> >      </analyzer>
>> >    </fieldType>
>> >
>> >
>> >
>> > The data indexed in this field is :
>> >
>> > sentence 1 : " tissue devitalization was noted in hepalocytes of liver"
>> > sentence 2 :  "Necrosis not found in liver"
>> >
>> > Synonyms:
>> > necrosis , tissue devitalization, cellular necrosis
>> >
>> > How does the white space and synonym filter behave?I am not able to
>> > understand in analysis page..Please let me know if  it is like this that
>> > works? Correct me if i am wrong..
>> >
>> > sentence 1 : " tissue devitalization was noted in hepalocytes of liver"
>> >
>> > white space :
>> > tissue
>> >  devitalization
>> >  was
>> >  noted
>> >  in
>> >  hepalocytes
>> >  of
>> > liver
>> >
>> > Synoyms for token words:
>> > No synonyms for tissue , no synonym for devitalization and so
>> > on.........................
>> > So does the "tissue devitalization" word will not become synonym for
>> > Necrosis ?(since it is mentioned in synonym)
>> >
>> > If it adds as the synonym, Then how is it splitting the sentence and
>> adding
>> > the filter? Which is happening first?
>> >
>> >
>> > Sentence 2: Necrosis not  found in liver
>> >
>> >
>> > white space
>> > Necrosis
>> > not
>> >  found
>> >  in
>> >  liver
>> >
>> >
>> > Synoyms for token words:
>> > synonyms for Necrosis: tissue devitalization,cellular necrosis, no
>> synonym
>> > for not, no synonym for found and so on.........................
>> >
>> > Is this correct?
>> >
>> >
>> > My main concern is when i have 3 set of data like this:
>> >
>> > tissue devitalization was observed in hepalocytes of liver
>> > necrosis was observed in liver
>> > Necrosis not found in liver
>> >
>> > When i search "Necrosis not found" I need to get only the last sentence.
>> >
>> > I am not able to find out the list of tokens and analysers that i need to
>> > apply in order to acheieve this desired output
>> >
>> > Awaiting reply
>> > Rajani Maski
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jun 14, 2011 at 3:13 PM, roySolr <royrutten1...@gmail.com>
>> wrote:
>> >
>> >> Maybe you can try to escape the synonyms so it's no tokized by
>> whitespace..
>> >>
>> >> Private\ schools,NGO\ Schools,Unaided\ schools
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/Query-on-Synonyms-feature-in-Solr-tp3058197p3062392.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >
>>
>

Re: Query on Synonyms feature in Solr

Reply via email to