On 3/29/2018 1:48 PM, Kelvyn Scrupps wrote: > I'm using WordDelimiterGraphFilter on a field and came across a curious > additional positional "hole" generated by the filter while playing with the > analysis tool. > For input "wibble , wobble" (space either side of the comma so it's a > separate token), the output introduces an additional positional hole after > the comma, i.e. > > Term position > Wibble 1 > , 2 > Wobble 4 * > > The positionlength for each is 1, so no obvious graph-span going on. > > Its not just comma, any punctuation would do, e.g. "wibble ! wobble"
The wrinkle here is enabling preserveOriginal at the same time that you have a term which is completely removed by the filter (in this case, the comma). If preserveOriginal is disabled, they both behave the same. I don't know if this is a bug or not. My instinct is to say it's a bug, but it's possible that this is expected. Having a term that's just a punctuation character in the index is generally not very useful ... but there are OTHER situations with this filter where preserveOriginal *is* the behavior you want. I would imagine that as long as you don't have terms that completely disappear when the filter runs, it would behave correctly. Try replacing the "," with "x," to see what I mean. Also, FYI, when using a Graph filter, the index analysis chain must also have this filter (but not the query analysis): <filter class="solr.FlattenGraphFilterFactory"/> Adding that didn't seem to fix the behavior that concerns you, but the docs do say it's required on the index analysis whenever using a Graph filter. Thanks, Shawn