Hi,

Once I apply PatternReplaceCharFilterFactory to the input string, the
position of token is changed.
Here is an example.
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(&lt;/?ce:italic[^>]*>)" replacement=""/>
<filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                splitOnCaseChange="0"
                splitOnNumerics="0"
                catenateWords="1"
                catenateNumbers="0"
                catenateAll="0"
                preserveOriginal="1"
                />

In the analysis page,
<ce:italic>p</ce:italic>-xylene and p-xylene (without xml tags) have
different positions.

for <ce:italic>p</ce:italic>-xylene,
p-xylene --> 1
xylene --> 2
p --> 2
pxylene -->

However, for the term (without tags) p-xylene,
p-xylene --> 1
p --> 1
xylene --> 2
pxylene --> 3

Only difference I can see is the start and end position because of xml tag.

Does any one know why?

Thanks,

Jae Joo

Reply via email to