Thank you very much...I shall try out the tokenizerFactory attribute on SynonymFilterFactory
On Tue, Oct 13, 2009 at 12:27 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : I had to be brief as my facets are in the order of 100K over 800K > documents > : and also if I give the complete schema.xml I was afraid nobody would read > my > : long message :-) ..Hence I showed only relevant pieces of the result > showing > : different fields having same problem > > relevant is good, but you have to provide a consistent picture from start > to finish ... you don't need to show 1,000 lines of facet field output, > but you at least need to show the field names. > > : <fieldType name="keywordText" class="solr.TextField" > : sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> > : <analyzer type="index"> > : <tokenizer class="solr.KeywordTokenizerFactory"/> > : <filter class="solr.TrimFilterFactory" /> > : <filter class="solr.StopFilterFactory" ignoreCase="true" > : words="stopwords.txt,entity-stopwords.txt" > enablePositionIncrements="true"/> > : > : <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > : ignoreCase="true" expand="false" /> > : <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > : </analyzer> > > ...have you used analysis.jsp to see what terms that analyzer produces > based on the strings you are indexing for your documents? becuase > combined with synonyms like this... > > : New York, N.Y., NY => New York > > ...it doesn't suprise me that you're getting "New" as an indexed term. > By default SynonymFilter uses whitespace to delimit tokens in multi-token > synonyms, so for some input like "NY" you should see it produce the token > "New" and "York" > > you can use the tokenizerFactory attribute on SynonymFilterFactory to > specify a TokenizerFactory class to use when parsing synonyms.txt > > > > -Hoss > >