Hi, Once I apply PatternReplaceCharFilterFactory to the input string, the position of token is changed. Here is an example. <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(</?ce:italic[^>]*>)" replacement=""/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnCaseChange="0" splitOnNumerics="0" catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="1" />
In the analysis page, <ce:italic>p</ce:italic>-xylene and p-xylene (without xml tags) have different positions. for <ce:italic>p</ce:italic>-xylene, p-xylene --> 1 xylene --> 2 p --> 2 pxylene --> However, for the term (without tags) p-xylene, p-xylene --> 1 p --> 1 xylene --> 2 pxylene --> 3 Only difference I can see is the start and end position because of xml tag. Does any one know why? Thanks, Jae Joo