Hello, I'm running into a case where a query is not returning the
results I expect, and I'm hoping someone can offer some explanation that
might help me fine tune things or understand what's up.
I am running Solr 4.3.
My filter chain includes a WordDelimiterFilter and, later a filter that
downcases everything for case-insensitive searching. It includes many
other things too, but I think these are the pertinent facts.
For query "dELALAIN", the WordDelimiterFilter splits into:
text: d
start: 0
position: 1
text: ELALAIN
start: 1
position: 2
text: dELALAIN
start: 0
position: 2
Note the duplication/overlap of the tokens -- one version with "d" and
"ELALAIN" split into two tokens, and another with just one token.
Later, all the tokens are lowercased by another filter in the chain.
(actually an ICU filter which is doing something more complicated than
just lowercasing, but I think we can consider it lowercasing for the
purposes of this discussion).
If I understand right what the WordDelimiterFilter is trying to do here,
it's probably doing something special because of the lowercase "d"
followed by an uppercase letter, a special case for that. (I don't get
this behavior with other mixed case queries not beginning with 'd').
And, what I think it's trying to do, is match text indexed as "d
elalain" as well as text indexed by "delalain".
The problem is, it's not accomplishing that -- it is NOT matching text
that was indexed as "delalain" (one token).
I don't entirely understand what the "position" attribute is for -- but
I wonder if in this case, the position on "dELALAIN" is really supposed
to be 1, not 2? Could that be responsible for the bug? Or is position
irrelevant in this case?
If that's not it, then I'm at a loss as to what may be causing this bug
-- or even if it's a bug at all, or I'm just not understanding intended
behavior. I expect a query for "dELALAIN" to match text indexed as
"delalain" (because of the forced lowercasing in the filter chain). But
it's not doing so. Are my expectations wrong? Bug? Something else?
Thanks for any advice,
Jonathan