I did infact check it out any there is no weirdness in analysis page...see result below
Index Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.TrimFilterFactory {} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.StopFilterFactory {words=entity-stopwords.txt, ignoreCase=true, enablePositionIncrements=true} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=false, ignoreCase=true} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position 1 term text New York term type word source start,end 0,8 payload Query Analyzer org.apache.solr.analysis.KeywordTokenizerFactory {} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.TrimFilterFactory {} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.StopFilterFactory {words=entity-stopwords.txt, ignoreCase=true, enablePositionIncrements=true} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.SynonymFilterFactory {synonyms=synonyms.txt, expand=false, ignoreCase=true} term position 1 term text New York term type word source start,end 0,8 payload org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {} term position 1 term text New York term type word source start,end 0,8 payload On Tue, Oct 6, 2009 at 4:19 PM, Christian Zambrano <czamb...@gmail.com>wrote: > Have you tried using the Analysis page to see what tokens are generated for > the string "New York"? It could be one of the token filter is adding the > token 'new' for all strings that start with 'new' > > > On 10/06/2009 02:54 PM, Ravi Kiran wrote: > >> Hello All, >> Iam getting some ghost facets in solr 1.4. Can anybody >> kindly >> help me understand why I get them and how to eliminate them. My schema.xml >> snippet is given at the end. Iam indexing Named Entities extracted via >> OpenNLP into solr. My understanding regarding KeywordTokenizerFactory is >> that it will use all words as a single token, am I right ? for example: >> "New >> York" will be indexed as 'New York' and will not be split right??? However >> I >> see then splitup in facets as follows when running the query " >> >> http://localhost:8080/solr-admin/topicscore/select/?facet=true&facet.limit=-1 >> "...but >> when I search with standard handler qt=standard&q=keyword:"New" I dont >> find >> any doc which has just "New". After digging in a bit I found that if >> several >> keywords have a common starting word it is being pulled out as another >> facet >> like the following. Any help is greatly appreciated >> >> Result >> ------------ >> <int name="New">47</int> --------> Ghost >> <int name="New Hampshire">7</int> >> <int name="New Jersey">16</int> >> <int name="New Orleans">10</int> >> <int name="New York">147</int> >> <int name="New York City">23</int> >> <int name="New York Giants">8</int> >> <int name="New York Islanders">5</int> >> <int name="New York Mercantile Exchange">6</int> >> <int name="New York Mets">8</int> >> <int name="New York Stock Exchange">10</int> >> <int name="New York Times">8</int> >> <int name="New York University">5</int> >> <int name="New Zealand">7</int> >> >> <int name="Energy">7</int> --------------> Ghost >> <int name="Energy Department">5</int> >> <int name="Energy Information Administration">5</int> >> >> >> <int name="Federal">7</int> --------------> Ghost >> <int name="Federal Deposit Insurance Corp.">6</int> >> <int name="Federal Reserve">26</int> >> <int name="Federal Reserve Chairman">6</int> >> >> <int name="North">27</int> >> <int name="North Carolina">8</int> >> <int name="North Dakota">7</int> >> <int name="North Korea">12</int> >> >> Schema.xml >> ----------------- >> >> <fieldType name="keywordText" class="solr.TextField" >> sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.KeywordTokenizerFactory"/> >> <filter class="solr.TrimFilterFactory" /> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt,entity-stopwords.txt" >> enablePositionIncrements="true"/> >> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> ignoreCase="true" expand="false" /> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.KeywordTokenizerFactory"/> >> <filter class="solr.TrimFilterFactory" /> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt,entity-stopwords.txt" enablePositionIncrements="true" >> /> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> ignoreCase="true" expand="false" /> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldType> >> >> <field name="person" type="keywordText" indexed="true" stored="true" >> multiValued="true" termVectors="false" termPositions="false" >> termOffsets="false"/> >> <field name="organization" type="keywordText" indexed="true" >> stored="true" multiValued="true" termVectors="false" termPositions="false" >> termOffsets="false"/> >> <field name="location" type="keywordText" indexed="true" stored="true" >> multiValued="true" termVectors="false" termPositions="false" >> termOffsets="false"/> >> <field name="keyword" type="keywordText" indexed="true" stored="true" >> multiValued="true" termVectors="false" termPositions="false" >> termOffsets="false"/> >> >> >> >