seeking feedback on edismax term-centric/field-centric proposal to resolve mm issue

Rudi Seitz Tue, 17 Jan 2023 11:46:32 -0800

Hi everyone,

I've been looking into a known issue where edismax sometimes switches from
a term-centric to a field-centric query generation style. This happens when
sow=false and the per-field analyzers generate differing numbers of tokens.
It's a problem worth solving because it causes inconsistency with the
semantics of the mm parameter.


I wrote a proposal for fixing this in SOLR-16594
<https://issues.apache.org/jira/browse/SOLR-16594> and am gently nudging to
see if anyone has feedback on the proposal. Do you think this approach
might work, or could you help me by explaining why it wouldn't work? It'd
be great to hear from anyone who's interested in this topic, on the ticket
directly or via this email thread. Thanks in advance!

Rudi

PS. There's more detail in the ticket, including links to other tickets &
blog entries, but here's a summary:

1) The challenge in generating a term-centric query when sow=false is that
the tokens that come out of an analysis chain don't have explicit pointers
to the input terms that they should be grouped by.
2) When the field analyzers all generate the same number of tokens, edismax
rewrites an initial set of field-centric clauses as term-centric ones,
using clause-position as a grouping heuristic, but this doesn't work if
there are differing numbers of tokens.
3) The current proposal is to use the startOffset of a token as the basis
for doing term-centric grouping.
4) There's an implementation challenge here because startOffset is not
propagated to the Term objects that edismax works with, but it could be.

seeking feedback on edismax term-centric/field-centric proposal to resolve mm issue

Reply via email to