I'm going to create a ticket for adding Min/Max scaling to the ReRanker. The ReRanker has access to all the topDocs so it should be pretty straightforward to min/max scale all the topDocs before ReRanking the topN.
Joel Bernstein http://joelsolr.blogspot.com/ On Thu, May 25, 2023 at 5:18 AM Alessandro Benedetti <a.benede...@sease.io> wrote: > Hi all, > our approach to providing hybrid search in Solr has been focused on the > reranking side, specifically enabling vector-based features in Learning To > Rank. > In this way, you can combine lexical features (such as the original BM25 > score) with various vector distances (in more than one field if you like) > and other factors using whatever model is supported (linear, tree-based, > neural network) > To do first-stage hybrid retrieval, that should be already decently > available through the boolean query parser. > > We started the work with function queries (that unfortunately are > scattered across Lucene and Solr, and now that the projects are separate > again, it's a lengthy process to go with. > Our first step is almost ready: > https://github.com/apache/lucene/pull/12253 > Any feedback is welcome! > > Then regarding the different problem of having an unbound relevance score > in Lucene/Solr, I agree that can (and should) be improved, I would love to > see it as a probabilistic score, but I imagine that making this change in > Lucene will cause an enormous discussion, probably ending in stand-still? > You have my support! > > > -------------------------- > *Alessandro Benedetti* > Director @ Sease Ltd. > *Apache Lucene/Solr Committer* > *Apache Solr PMC Member* > > e-mail: a.benede...@sease.io > > > *Sease* - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > > On Tue, 23 May 2023 at 19:17, Mikhail Khludnev <m...@apache.org> wrote: > > > Hello, Joel. > > > > Here's my idea > > https://lists.apache.org/thread/6t45p5fk4hldrt1833kvrbobdd2pk265 > > > > > > On Tue, May 23, 2023 at 6:20 PM Joel Bernstein <joels...@gmail.com> > wrote: > > > > > One of the things that I'm focusing on is combining the Solr similarity > > > score with the vector score in a consistent manner. My main concern is > > > dealing with the unbounded nature of the Solr similarity score and how > to > > > balance that with a vector score. > > > > > > So my first question are there any mechanisms now to scale or squash > the > > > Solr similarity score before combining with a vector score? > > > > > > Below are two ideas I have for squashing / scaling the score: > > > > > > 1) SquashingScoreQuery. This is a wrapper query that squashes the score > > of > > > its wrapped query using a sigmoid function. > > > > > > 2) Min/Max scale the main query score in the ReRanker. This simply > adds a > > > flag to the ReRanker to min/max scale the main query scores before > > > combining with the ReRank query. > > > > > > Do others have thoughts on this? > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > https://t.me/MUST_SEARCH > > A caveat: Cyrillic! > > >