Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

Robert Muir Wed, 05 Oct 2011 12:24:47 -0700

On Wed, Oct 5, 2011 at 3:03 PM, David Ryan <help...@gmail.com> wrote:
> Do you mean both BM25 and BM25F?
>
>


No, BM25F and other "fielded" or structured models are somewhat different.

In these model, if you have two fields (body/title) you are saying
that "dogs" in body is actually the same term as "dogs" in title. This
is only appropriate in these cases, but not for all fields in the
document (e.g. many solr users use copyField and analyzer content in
different ways, so the terms are different, or put different languages
in different fields).

In my opinion, to support models like this we should add a "structured
query" (and ideally queryparser hooks) representing this intent, that
works for a term across multiple fields where you declare this is the
case. This would be a future improvement, and i'm not sure BM25F is
ever a good fit because it wants a document-level idf (not really a
practical thing for lucene, unless we come up with some cool
approximation), but newer models like pl2f/mdl2 use corpus total term
frequency instead which we can compute from the components
efficiently.


-- 
lucidimagination.com

Re: New scoring models in LUCENE/SOLR (LUCENE-2959)

Reply via email to