Hi,

I have this fieldType configuration:

<fieldType name="cod_parts" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="[-/\@]"
replacement=" " />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1"
catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
splitOnNumerics="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopFilterFactory" words="stopwords.txt" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>

Using Solr Field Analysis tool for the string "0000aaa", in the last step
at end I see this:

text     | 0000aaa | 0000 | 0000aaa | aaa
position | 1       | 1    | 1       | 2
start    | 0       | 0    | 0       | 4
end      | 8       | 4    | 7       | 7
type     | word    | word | word    | word


Now I'm quite surprised to see there are two occurrences of "0000aaa".
Why? I suppose there should be something to do with the position, but I
don't understand what.
RemoveDuplicatesTokenFilterFactory should't remove all the duplicates?


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Reply via email to