It looks like the fact that this duplicate token is generated by
WordDelimiter after StopFilter means that it's not filtered out.
In any case, a search on "david david" against this field does find
documents with values like "David's" as well as "David, David,
David..."
Michael Della Bitta
-
Yes, that had occurred to me too, but I wasn't exposed to the original
query from the developer who was having the trouble, just the text and
strange analysis. I'll confer with him to make sure there's actually
something to work on here.
Michael Della Bitta
---
I agree that it would make more sense for the catenated word ("johnsons") to
be at the same position as the leading word ("johnson").
But, what are some example queries that would "fail" given this behavior?
"johnson and johnson" would not falsely match since you have position
increment enable