ok. Thank you. I will consider this. One last doubt ,how do i handle negation terms?
In the above mail as i mentioned, If i have 3 sentence like this: 1 .tissue devitalization was observed in hepalocytes of liver 2. necrosis was observed in liver 3. Necrosis not found in liver When i search "Necrosis not found" I need to get only the last sentence. but now i get all the 3 results. I am not able to find out the list of tokens and analysers that i need to apply in order to acheieve this desired output Awaiting reply Rajani Maski As explained in the above mail, On Wed, Jun 15, 2011 at 9:42 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Well, first it is usually unnecessary to specify the > synonym filter both at index and query time, I'd apply > it only at query time to start, then perhaps switch > to index time, see the discussion at: > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46 > for why index-time is preferable. > Note you'll have to re-index. > > That said, essentially what happens (and assuming > synonym filter is only in the query part) is you have > something like this as your search for "necrosis not > found". > > Offset 0 offset1 offset 2 > necrosis > tissue devitalization not found > cellular necrosis > > > Note that one of your three synonyms must appear in position 0, > followed by the other two terms. > > So your example should "just work". But as I said, it would probably > be best if you put your synonym filter only in at index or query time. > > An analogous process happens if you add synonyms at index > time. > > Best > Erick > > On Wed, Jun 15, 2011 at 8:14 AM, rajini maski <rajinima...@gmail.com> > wrote: > > Erick: I have tried what you said. I needed clarification on this.. Below > is > > my doubt added: > > > > Say If i have field type : > > > > <fieldType name="Synonymdata" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter" > > > synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt" > > ignoreCase="true" expand="true"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" language="English" > > protected="protwords.txt"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="org.apache.solr.orchsynonym.OrchSynonymFilter" > > > synonyms="BODYTaxonomy.txt,PalpClinLocObsTaxo.txt,MacroscopicTaxonomy.txt,MicroscopicTaxonomy.txt,SpecimenTaxonomy.txt,ParameterTaxonomy.txt,StrainTaxonomy.txt" > > ignoreCase="true" expand="false"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > words="stopwords.txt" enablePositionIncrements="true" /> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.SnowballPorterFilterFactory" > language="English" > > protected="protwords.txt"/> > > </analyzer> > > </fieldType> > > > > > > > > The data indexed in this field is : > > > > sentence 1 : " tissue devitalization was noted in hepalocytes of liver" > > sentence 2 : "Necrosis not found in liver" > > > > Synonyms: > > necrosis , tissue devitalization, cellular necrosis > > > > How does the white space and synonym filter behave?I am not able to > > understand in analysis page..Please let me know if it is like this that > > works? Correct me if i am wrong.. > > > > sentence 1 : " tissue devitalization was noted in hepalocytes of liver" > > > > white space : > > tissue > > devitalization > > was > > noted > > in > > hepalocytes > > of > > liver > > > > Synoyms for token words: > > No synonyms for tissue , no synonym for devitalization and so > > on......................... > > So does the "tissue devitalization" word will not become synonym for > > Necrosis ?(since it is mentioned in synonym) > > > > If it adds as the synonym, Then how is it splitting the sentence and > adding > > the filter? Which is happening first? > > > > > > Sentence 2: Necrosis not found in liver > > > > > > white space > > Necrosis > > not > > found > > in > > liver > > > > > > Synoyms for token words: > > synonyms for Necrosis: tissue devitalization,cellular necrosis, no > synonym > > for not, no synonym for found and so on......................... > > > > Is this correct? > > > > > > My main concern is when i have 3 set of data like this: > > > > tissue devitalization was observed in hepalocytes of liver > > necrosis was observed in liver > > Necrosis not found in liver > > > > When i search "Necrosis not found" I need to get only the last sentence. > > > > I am not able to find out the list of tokens and analysers that i need to > > apply in order to acheieve this desired output > > > > Awaiting reply > > Rajani Maski > > > > > > > > > > > > > > > > > > > > > > On Tue, Jun 14, 2011 at 3:13 PM, roySolr <royrutten1...@gmail.com> > wrote: > > > >> Maybe you can try to escape the synonyms so it's no tokized by > whitespace.. > >> > >> Private\ schools,NGO\ Schools,Unaided\ schools > >> > >> -- > >> View this message in context: > >> > http://lucene.472066.n3.nabble.com/Query-on-Synonyms-feature-in-Solr-tp3058197p3062392.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > > >