Hi everyone, I've been looking into a known issue where edismax sometimes switches from a term-centric to a field-centric query generation style. This happens when sow=false and the per-field analyzers generate differing numbers of tokens. It's a problem worth solving because it causes inconsistency with the semantics of the mm parameter.
I wrote a proposal for fixing this in SOLR-16594 <https://issues.apache.org/jira/browse/SOLR-16594> and am gently nudging to see if anyone has feedback on the proposal. Do you think this approach might work, or could you help me by explaining why it wouldn't work? It'd be great to hear from anyone who's interested in this topic, on the ticket directly or via this email thread. Thanks in advance! Rudi PS. There's more detail in the ticket, including links to other tickets & blog entries, but here's a summary: 1) The challenge in generating a term-centric query when sow=false is that the tokens that come out of an analysis chain don't have explicit pointers to the input terms that they should be grouped by. 2) When the field analyzers all generate the same number of tokens, edismax rewrites an initial set of field-centric clauses as term-centric ones, using clause-position as a grouping heuristic, but this doesn't work if there are differing numbers of tokens. 3) The current proposal is to use the startOffset of a token as the basis for doing term-centric grouping. 4) There's an implementation challenge here because startOffset is not propagated to the Term objects that edismax works with, but it could be.