[ 
https://issues.apache.org/jira/browse/LUCENE-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357264#comment-17357264
 ] 

Jim Ferenczi commented on LUCENE-8216:
--------------------------------------

> From a quick look at your code, the document frequency is just calcolated as 
>the max document frequency, across all the field involved (and that is 
>actually the lower bound of the real blended document frequency).
Was it done with this approximation for simplicity, or there's any other reason?

We need an approximation because computing the "real" frequency is too costly. 
We also need the approximation to be bounded by max_doc so the max frequency 
does the job. 

> Better cross-field scoring
> --------------------------
>
>                 Key: LUCENE-8216
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8216
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Jim Ferenczi
>            Priority: Major
>             Fix For: 8.0
>
>         Attachments: LUCENE-8216.patch, LUCENE-8216.patch
>
>
> I'd like Lucene to have better support for scoring across multiple fields. 
> Today we have BlendedTermQuery which tries to help there but it probably 
> tries to do too much on some aspects (handling cross-field term queries AND 
> synonyms) and too little on other ones (it tries to merge index-level 
> statistics, but not per-document statistics like tf and norm).
> Maybe we could implement something like BM25F so that queries across multiple 
> fields would retain the benefits of BM25 like the fact that the impact of the 
> term frequency saturates quickly, which is not the case with BlendedTermQuery 
> if you have occurrences across many fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to