Thanks, that's helpful. 

It still seems like current behavior does the "wrong" thing in _many_ cases (I 
know a lot of people get tripped up by it, sometimes on this list) -- but I 
understand your cases where it does the right thing, and where what I'm 
suggesting would be the wrong thing. 

> Ultimately the problem you had with "&" is the same problem people have 
> with stopwords, and comes down to the same thing: if you don't want some 
> chunk of text to be "significant" when searchng a field in your qf, have 
> your analyzer remove it 

Ah, but see the problem people have with stopwords is when they actually DID 
that. They didn't want a term to be 'significant' in one field, but they DID 
want it to be 'significant' in another field... but how this effects the 'mm' 
ends up being kind of counter-intuitive for some (but not other) 
setups/intentions.   It's counter-intuitive to me that adding a field to the 
'qf' set results in _fewer_ hits than the same 'qf' set without the new field 
-- although I understand your cases where you added the field to the 'qf' 
precisely in order to intentionally get that behavior, that's definitely not a 
universal case. 

And the fact that unpredictable changes to field analysis that aren't as simple 
as stopwords can lead to this same problem (as in this case where one field 
ignores punctuation and the other doesn't) -- it's definitely a trap waiting 
for some people. 

I wonder if it would be a good idea to have a parameter to (e)dismax that told 
it which of these two behaviors to use? The one where the 'term count' is based 
on the maximum number of terms from any field in the 'qf', and one where it's 
based on the minimum number of terms produced from any field in the qf?  I am 
still not sure how feasible THAT is, but it seems like a good idea to me. The 
current behavior is definitely a pitfall for many people.  

Or maybe a feature where you tell dismax, the number of tokens produced by 
field X, THAT's the one you should use for your 'term count' for mm, all the 
other fields are really just in there as sort of supplementary -- for boosting, 
or for bringing a few more results in; but NOT the case where you intentionally 
add a 'qf' with KeepWordsFilter in order to intentionally _reduce_ the result 
set . I think that's a pretty common use case too. 

Jonathan

Reply via email to