On Wed, Oct 5, 2011 at 3:03 PM, David Ryan <help...@gmail.com> wrote: > Do you mean both BM25 and BM25F? > >
No, BM25F and other "fielded" or structured models are somewhat different. In these model, if you have two fields (body/title) you are saying that "dogs" in body is actually the same term as "dogs" in title. This is only appropriate in these cases, but not for all fields in the document (e.g. many solr users use copyField and analyzer content in different ways, so the terms are different, or put different languages in different fields). In my opinion, to support models like this we should add a "structured query" (and ideally queryparser hooks) representing this intent, that works for a term across multiple fields where you declare this is the case. This would be a future improvement, and i'm not sure BM25F is ever a good fit because it wants a document-level idf (not really a practical thing for lucene, unless we come up with some cool approximation), but newer models like pl2f/mdl2 use corpus total term frequency instead which we can compute from the components efficiently. -- lucidimagination.com