Hello, I'm running into a case where a query is not returning the results I expect, and I'm hoping someone can offer some explanation that might help me fine tune things or understand what's up.

I am running Solr 4.3.

My filter chain includes a WordDelimiterFilter and, later a filter that downcases everything for case-insensitive searching. It includes many other things too, but I think these are the pertinent facts.

For query "dELALAIN", the WordDelimiterFilter splits into:

text: d
start: 0
position: 1

text: ELALAIN
start: 1
position: 2

text: dELALAIN
start: 0
position: 2

Note the duplication/overlap of the tokens -- one version with "d" and "ELALAIN" split into two tokens, and another with just one token.

Later, all the tokens are lowercased by another filter in the chain. (actually an ICU filter which is doing something more complicated than just lowercasing, but I think we can consider it lowercasing for the purposes of this discussion).

If I understand right what the WordDelimiterFilter is trying to do here, it's probably doing something special because of the lowercase "d" followed by an uppercase letter, a special case for that. (I don't get this behavior with other mixed case queries not beginning with 'd').

And, what I think it's trying to do, is match text indexed as "d elalain" as well as text indexed by "delalain".

The problem is, it's not accomplishing that -- it is NOT matching text that was indexed as "delalain" (one token).

I don't entirely understand what the "position" attribute is for -- but I wonder if in this case, the position on "dELALAIN" is really supposed to be 1, not 2? Could that be responsible for the bug? Or is position irrelevant in this case?

If that's not it, then I'm at a loss as to what may be causing this bug -- or even if it's a bug at all, or I'm just not understanding intended behavior. I expect a query for "dELALAIN" to match text indexed as "delalain" (because of the forced lowercasing in the filter chain). But it's not doing so. Are my expectations wrong? Bug? Something else?

Thanks for any advice,

Jonathan

Reply via email to