Seems odd to me as well. I suspect you can work around this by either setting catenateall="0" or perserveOriginal="0"
Best, Erick On Fri, Oct 9, 2015 at 7:50 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote: > Hi, > > I have this fieldType configuration: > > <fieldType name="cod_parts" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory" /> > <filter class="solr.PatternReplaceFilterFactory" pattern="[-/\@]" > replacement=" " /> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" > catenateNumbers="0" catenateAll="1" splitOnCaseChange="0" > splitOnNumerics="1" preserveOriginal="1" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.StopFilterFactory" words="stopwords.txt" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory" /> > </analyzer> > </fieldType> > > Using Solr Field Analysis tool for the string "0000aaa", in the last step > at end I see this: > > text | 0000aaa | 0000 | 0000aaa | aaa > position | 1 | 1 | 1 | 2 > start | 0 | 0 | 0 | 4 > end | 8 | 4 | 7 | 7 > type | word | word | word | word > > > Now I'm quite surprised to see there are two occurrences of "0000aaa". > Why? I suppose there should be something to do with the position, but I > don't understand what. > RemoveDuplicatesTokenFilterFactory should't remove all the duplicates? > > > -- > Vincenzo D'Amore > email: v.dam...@gmail.com > skype: free.dev > mobile: +39 349 8513251