Yes Exactly the same On Tue, Oct 6, 2009 at 4:52 PM, Christian Zambrano <czamb...@gmail.com>wrote:
> And you had the analyzer for that field set-up the same way as shown on > your previous e-mail when you indexed the data? > > > > > On 10/06/2009 03:46 PM, Ravi Kiran wrote: > >> I did infact check it out any there is no weirdness in analysis page...see >> result below >> >> Index Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {} term >> position 1 term text New York term type word source start,end 0,8 payload >> org.apache.solr.analysis.TrimFilterFactory {} term position 1 term text >> New >> York term type word source start,end 0,8 payload >> org.apache.solr.analysis.StopFilterFactory {words=entity-stopwords.txt, >> ignoreCase=true, enablePositionIncrements=true} term position 1 term text >> New >> York term type word source start,end 0,8 payload >> org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, >> expand=false, ignoreCase=true} term position 1 term text New York term >> type >> word source start,end 0,8 payload >> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term >> position 1 term text New York term type word source start,end 0,8 payload >> Query Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {} term >> position 1 term text New York term type word source start,end 0,8 payload >> org.apache.solr.analysis.TrimFilterFactory {} term position 1 term text >> New >> York term type word source start,end 0,8 payload >> org.apache.solr.analysis.StopFilterFactory {words=entity-stopwords.txt, >> ignoreCase=true, enablePositionIncrements=true} term position 1 term text >> New >> York term type word source start,end 0,8 payload >> org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, >> expand=false, ignoreCase=true} term position 1 term text New York term >> type >> word source start,end 0,8 payload >> org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term >> position 1 term text New York term type word source start,end 0,8 payload >> >> >> On Tue, Oct 6, 2009 at 4:19 PM, Christian Zambrano<czamb...@gmail.com >> >wrote: >> >> >> >>> Have you tried using the Analysis page to see what tokens are generated >>> for >>> the string "New York"? It could be one of the token filter is adding the >>> token 'new' for all strings that start with 'new' >>> >>> >>> On 10/06/2009 02:54 PM, Ravi Kiran wrote: >>> >>> >>> >>>> Hello All, >>>> Iam getting some ghost facets in solr 1.4. Can anybody >>>> kindly >>>> help me understand why I get them and how to eliminate them. My >>>> schema.xml >>>> snippet is given at the end. Iam indexing Named Entities extracted via >>>> OpenNLP into solr. My understanding regarding KeywordTokenizerFactory is >>>> that it will use all words as a single token, am I right ? for example: >>>> "New >>>> York" will be indexed as 'New York' and will not be split right??? >>>> However >>>> I >>>> see then splitup in facets as follows when running the query " >>>> >>>> >>>> http://localhost:8080/solr-admin/topicscore/select/?facet=true&facet.limit=-1 >>>> "...but >>>> when I search with standard handler qt=standard&q=keyword:"New" I dont >>>> find >>>> any doc which has just "New". After digging in a bit I found that if >>>> several >>>> keywords have a common starting word it is being pulled out as another >>>> facet >>>> like the following. Any help is greatly appreciated >>>> >>>> Result >>>> ------------ >>>> <int name="New">47</int> --------> Ghost >>>> <int name="New Hampshire">7</int> >>>> <int name="New Jersey">16</int> >>>> <int name="New Orleans">10</int> >>>> <int name="New York">147</int> >>>> <int name="New York City">23</int> >>>> <int name="New York Giants">8</int> >>>> <int name="New York Islanders">5</int> >>>> <int name="New York Mercantile Exchange">6</int> >>>> <int name="New York Mets">8</int> >>>> <int name="New York Stock Exchange">10</int> >>>> <int name="New York Times">8</int> >>>> <int name="New York University">5</int> >>>> <int name="New Zealand">7</int> >>>> >>>> <int name="Energy">7</int> --------------> Ghost >>>> <int name="Energy Department">5</int> >>>> <int name="Energy Information Administration">5</int> >>>> >>>> >>>> <int name="Federal">7</int> --------------> Ghost >>>> <int name="Federal Deposit Insurance Corp.">6</int> >>>> <int name="Federal Reserve">26</int> >>>> <int name="Federal Reserve Chairman">6</int> >>>> >>>> <int name="North">27</int> >>>> <int name="North Carolina">8</int> >>>> <int name="North Dakota">7</int> >>>> <int name="North Korea">12</int> >>>> >>>> Schema.xml >>>> ----------------- >>>> >>>> <fieldType name="keywordText" class="solr.TextField" >>>> sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> >>>> <analyzer type="index"> >>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>> <filter class="solr.TrimFilterFactory" /> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="stopwords.txt,entity-stopwords.txt" >>>> enablePositionIncrements="true"/> >>>> >>>> <filter class="solr.SynonymFilterFactory" >>>> synonyms="synonyms.txt" >>>> ignoreCase="true" expand="false" /> >>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>> </analyzer> >>>> <analyzer type="query"> >>>> <tokenizer class="solr.KeywordTokenizerFactory"/> >>>> <filter class="solr.TrimFilterFactory" /> >>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>> words="stopwords.txt,entity-stopwords.txt" >>>> enablePositionIncrements="true" >>>> /> >>>> <filter class="solr.SynonymFilterFactory" >>>> synonyms="synonyms.txt" >>>> ignoreCase="true" expand="false" /> >>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> <field name="person" type="keywordText" indexed="true" stored="true" >>>> multiValued="true" termVectors="false" termPositions="false" >>>> termOffsets="false"/> >>>> <field name="organization" type="keywordText" indexed="true" >>>> stored="true" multiValued="true" termVectors="false" >>>> termPositions="false" >>>> termOffsets="false"/> >>>> <field name="location" type="keywordText" indexed="true" >>>> stored="true" >>>> multiValued="true" termVectors="false" termPositions="false" >>>> termOffsets="false"/> >>>> <field name="keyword" type="keywordText" indexed="true" >>>> stored="true" >>>> multiValued="true" termVectors="false" termPositions="false" >>>> termOffsets="false"/> >>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >