Re: schema.xml field configuration

Erick Erickson Fri, 09 Oct 2015 08:30:18 -0700

Seems odd to me as well. I suspect you can work around
this by either setting catenateall="0" or perserveOriginal="0"


Best,
Erick

On Fri, Oct 9, 2015 at 7:50 AM, Vincenzo D'Amore <v.dam...@gmail.com> wrote:
> Hi,
>
> I have this fieldType configuration:
>
> <fieldType name="cod_parts" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer>
> <tokenizer class="solr.KeywordTokenizerFactory" />
> <filter class="solr.PatternReplaceFilterFactory" pattern="[-/\@]"
> replacement=" " />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
> splitOnNumerics="1" preserveOriginal="1" />
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.StopFilterFactory" words="stopwords.txt" />
> <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> </analyzer>
> </fieldType>
>
> Using Solr Field Analysis tool for the string "0000aaa", in the last step
> at end I see this:
>
> text     | 0000aaa | 0000 | 0000aaa | aaa
> position | 1       | 1    | 1       | 2
> start    | 0       | 0    | 0       | 4
> end      | 8       | 4    | 7       | 7
> type     | word    | word | word    | word
>
>
> Now I'm quite surprised to see there are two occurrences of "0000aaa".
> Why? I suppose there should be something to do with the position, but I
> don't understand what.
> RemoveDuplicatesTokenFilterFactory should't remove all the duplicates?
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251

Re: schema.xml field configuration

Reply via email to