Re: Query on Synonyms feature in Solr

Erick Erickson Wed, 15 Jun 2011 09:13:49 -0700

Well, first it is usually unnecessary to specify the
synonym filter both at index and query time, I'd apply
it only at query time to start, then perhaps switch
to index time, see the discussion at:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
for why index-time is preferable.
Note you'll have to re-index.


That said, essentially what happens (and assuming
synonym filter is only in the query part) is you have
something like this as your search for "necrosis not
found".

Offset 0                         offset1         offset 2
necrosis
tissue devitalization        not            found
cellular necrosis


Note that one of your three synonyms must appear in position 0,
followed by the other two terms.

So your example should "just work". But as I said, it would probably
be best if you put your synonym filter only in at index or query time.

An analogous process happens if you add synonyms at index
time.

Best
Erick

On Wed, Jun 15, 2011 at 8:14 AM, rajini maski <rajinima...@gmail.com> wrote:
> Erick: I have tried what you said. I needed clarification on this.. Below is
> my doubt added:
>
> Say If i have field type :
>
> <fieldType name="Synonymdata" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter"
> synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt"
> ignoreCase="true" expand="true"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter"
> synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt"
> ignoreCase="true" expand="false"/>
>        <filter class="solr.StopFilterFactory"  ignoreCase="true"
> words="stopwords.txt"       enablePositionIncrements="true" />
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>      </analyzer>
>    </fieldType>
>
>
>
> The data indexed in this field is :
>
> sentence 1 : " tissue devitalization was noted in hepalocytes of liver"
> sentence 2 :  "Necrosis not found in liver"
>
> Synonyms:
> necrosis , tissue devitalization, cellular necrosis
>
> How does the white space and synonym filter behave?I am not able to
> understand in analysis page..Please let me know if  it is like this that
> works? Correct me if i am wrong..
>
> sentence 1 : " tissue devitalization was noted in hepalocytes of liver"
>
> white space :
> tissue
>  devitalization
>  was
>  noted
>  in
>  hepalocytes
>  of
> liver
>
> Synoyms for token words:
> No synonyms for tissue , no synonym for devitalization and so
> on.........................
> So does the "tissue devitalization" word will not become synonym for
> Necrosis ?(since it is mentioned in synonym)
>
> If it adds as the synonym, Then how is it splitting the sentence and adding
> the filter? Which is happening first?
>
>
> Sentence 2: Necrosis not  found in liver
>
>
> white space
> Necrosis
> not
>  found
>  in
>  liver
>
>
> Synoyms for token words:
> synonyms for Necrosis: tissue devitalization,cellular necrosis, no synonym
> for not, no synonym for found and so on.........................
>
> Is this correct?
>
>
> My main concern is when i have 3 set of data like this:
>
> tissue devitalization was observed in hepalocytes of liver
> necrosis was observed in liver
> Necrosis not found in liver
>
> When i search "Necrosis not found" I need to get only the last sentence.
>
> I am not able to find out the list of tokens and analysers that i need to
> apply in order to acheieve this desired output
>
> Awaiting reply
> Rajani Maski
>
>
>
>
>
>
>
>
>
>
> On Tue, Jun 14, 2011 at 3:13 PM, roySolr <royrutten1...@gmail.com> wrote:
>
>> Maybe you can try to escape the synonyms so it's no tokized by whitespace..
>>
>> Private\ schools,NGO\ Schools,Unaided\ schools
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Query-on-Synonyms-feature-in-Solr-tp3058197p3062392.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>

Re: Query on Synonyms feature in Solr

Reply via email to