Hi, I have this fieldType configuration:
<fieldType name="cod_parts" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[-/\@]" replacement=" " /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0" splitOnNumerics="1" preserveOriginal="1" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopFilterFactory" words="stopwords.txt" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> </analyzer> </fieldType> Using Solr Field Analysis tool for the string "0000aaa", in the last step at end I see this: text | 0000aaa | 0000 | 0000aaa | aaa position | 1 | 1 | 1 | 2 start | 0 | 0 | 0 | 4 end | 8 | 4 | 7 | 7 type | word | word | word | word Now I'm quite surprised to see there are two occurrences of "0000aaa". Why? I suppose there should be something to do with the position, but I don't understand what. RemoveDuplicatesTokenFilterFactory should't remove all the duplicates? -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251