On 3/29/2018 1:48 PM, Kelvyn Scrupps wrote:
> I'm using WordDelimiterGraphFilter on a field and came across a curious 
> additional positional "hole" generated by the filter while playing with the 
> analysis tool.  
> For input "wibble , wobble" (space either side of the comma so it's a 
> separate token), the output introduces an additional positional hole after 
> the comma, i.e. 
>
> Term   position
> Wibble 1
> ,  2
> Wobble  4 *
>
> The positionlength for each is 1, so no obvious graph-span going on.
>
> Its not just comma, any punctuation would do, e.g. "wibble ! wobble"

The wrinkle here is enabling preserveOriginal at the same time that you
have a term which is completely removed by the filter (in this case, the
comma).  If preserveOriginal is disabled, they both behave the same.  I
don't know if this is a bug or not.  My instinct is to say it's a bug,
but it's possible that this is expected.

Having a term that's just a punctuation character in the index is
generally not very useful ... but there are OTHER situations with this
filter where preserveOriginal *is* the behavior you want.  I would
imagine that as long as you don't have terms that completely disappear
when the filter runs, it would behave correctly.  Try replacing the ","
with "x," to see what I mean.

Also, FYI, when using a Graph filter, the index analysis chain must also
have this filter (but not the query analysis):

        <filter class="solr.FlattenGraphFilterFactory"/>

Adding that didn't seem to fix the behavior that concerns you, but the
docs do say it's required on the index analysis whenever using a Graph
filter.

Thanks,
Shawn

Reply via email to