[
https://issues.apache.org/jira/browse/LUCENE-9725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julie Tibshirani resolved LUCENE-9725.
--------------------------------------
Fix Version/s: 8.9
Resolution: Fixed
> Allow BM25FQuery to use other similarities
> ------------------------------------------
>
> Key: LUCENE-9725
> URL: https://issues.apache.org/jira/browse/LUCENE-9725
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Julie Tibshirani
> Priority: Major
> Fix For: 8.9
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> From a high level, BM25FQuery works as follows:
> # Given a list of fields and weights, it pretends there's a synthetic
> combined field where all terms have been indexed. It computes new term and
> collection statistics for this combined field.
> # It uses a disjunction iterator and BM25Similarity to score the documents.
> The steps are (1) compute statistics that represent the combined field
> content, and (2) pass these to a similarity function. There is nothing really
> specific to BM25Similarity in this approach. In step 2, we could use another
> similarity, for example BooleanSimilarity or those based on language models
> like LMDirichletSimilarity. The main restriction is that norms have to be
> additive (the norm of the combined field must be the sum of the field norms).
> Maybe we could unhardcode BM25Similarity in BM25FQuery and instead use the
> one configured on IndexSearcher. We could think of this as providing a
> sensible default approach to cross-field scoring for many similarities. It's
> an incremental step towards LUCENE-8711, which would give similarities more
> fine-grained control over how stats/ scores are combined across fields.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]