: not other) setups/intentions. It's counter-intuitive to me that adding : a field to the 'qf' set results in _fewer_ hits than the same 'qf' set
agreed .. but that's where looking the debug info comes in to understand the reason for that behavior is that your old qf treated part of your input as garbage and that new field respects it and uses it in the calculation. mind you: the "fewer hits" behavior only happens when using a percentage value in mm ... if you had mm=2 you'd get more results, but you've asked for "66%" (or whatever) and with that new qf there is a differnet number of clauses produced by query parsing. : I wonder if it would be a good idea to have a parameter to (e)dismax : that told it which of these two behaviors to use? The one where the : 'term count' is based on the maximum number of terms from any field in : the 'qf', and one where it's based on the minimum number of terms : produced from any field in the qf? I am still not sure how feasible even in your use case, i don't think you are fully considering what that would produce. imagine that an mmType=min param existed and gave you what you're asking for. Now imagine that you have two fields, one named "simple" that strips all punctuation and one named "complex" that doesn't, and you have a query like this... q=Foo & Bar qf=simple complex mm=100% mmType=min * Foo produces tokens for all qf * & only produces tokens for some qf (complex) * Bar products tokens for all qf your mmType would say "there are only 2 tokens that we can query across all fields, so our computed minShouldMatch should be 100% of 2 == 2" sounds good so far right? the problem is you still have query clause coming from that "&" character ... you have 3 real clauses, one of which is that term query for "complex:&" which means that with your (computed) minShouldMatch of 2 you would see matches for any doc that happened to have indexed the "&" symbol in the "complex" field and also matched *either* of Foo or Bar (in either field) So while a lot of your results would match both Foo and Bar, you'd get still get a bunch of weird results. : Or maybe a feature where you tell dismax, the number of tokens produced : by field X, THAT's the one you should use for your 'term count' for mm, Hmmm.... maybe. i'd have to see a patch in action and play with it, to really think it through ... hmmm ... honestly i really can't imagine how that would be helpful in general... in order to use a feature like that you'd have to really think hard about the query analysis of your fields, and which ones will produce which tokens in which situations in order to make sure you pick the *right* value for that param -- but once you've done that hard thinking you might as well feed it back into your schema.xml and say "the query analyzer for field 'complex' should prune any tokens that only contain punctuation" (instead of saying "'complex' will produce tokens that only contain punctuation, so lets tell dismax to compute mm based only on 'simple'). Afterall, there might not be one single field that you can pick -- maybe 'complex' lets tokens that are all punctuation through but strips stopwords, and maybe 'simple' does the opposite ... no param value you pick will help you with that possibility, you really just need to fix the query analyzers to make sense if you want to use both of those two fields in the qf. -Hoss