Re: Hybrid scoring lexical / vector

Joel Bernstein Fri, 26 May 2023 13:59:06 -0700

I'm going to create a ticket for adding Min/Max scaling to the ReRanker.
The ReRanker has access to all the topDocs so it should be pretty
straightforward to min/max scale all the topDocs before ReRanking the topN.



Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, May 25, 2023 at 5:18 AM Alessandro Benedetti <a.benede...@sease.io>
wrote:

> Hi all,
> our approach to providing hybrid search in Solr has been focused on the
> reranking side, specifically enabling vector-based features in Learning To
> Rank.
> In this way, you can combine lexical features (such as the original BM25
> score) with various vector distances (in more than one field if you like)
> and other factors using whatever model is supported (linear, tree-based,
> neural network)
> To do first-stage hybrid retrieval, that should be already decently
> available through the boolean query parser.
>
> We started the work with function queries (that unfortunately are
> scattered across Lucene and Solr, and now that the projects are separate
> again, it's a lengthy process to go with.
> Our first step is almost ready:
> https://github.com/apache/lucene/pull/12253
> Any feedback is welcome!
>
> Then regarding the different problem of having an unbound relevance score
> in Lucene/Solr, I agree that can (and should) be improved, I would love to
> see it as a probabilistic score, but I imagine that making this change in
> Lucene will cause an enormous discussion, probably ending in stand-still?
> You have my support!
>
>
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Tue, 23 May 2023 at 19:17, Mikhail Khludnev <m...@apache.org> wrote:
>
> > Hello, Joel.
> >
> > Here's my idea
> > https://lists.apache.org/thread/6t45p5fk4hldrt1833kvrbobdd2pk265
> >
> >
> > On Tue, May 23, 2023 at 6:20 PM Joel Bernstein <joels...@gmail.com>
> wrote:
> >
> > > One of the things that I'm focusing on is combining the Solr similarity
> > > score with the vector score in a consistent manner. My main concern is
> > > dealing with the unbounded nature of the Solr similarity score and how
> to
> > > balance that with a vector score.
> > >
> > > So my first question are there any mechanisms now to scale or squash
> the
> > > Solr similarity score before combining with a vector score?
> > >
> > > Below are two ideas I have for squashing / scaling the score:
> > >
> > > 1) SquashingScoreQuery. This is a wrapper query that squashes the score
> > of
> > > its wrapped query using a sigmoid function.
> > >
> > > 2) Min/Max scale the main query score in the ReRanker. This simply
> adds a
> > > flag to the ReRanker to min/max scale the main query scores before
> > > combining with the ReRank query.
> > >
> > > Do others have thoughts on this?
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>

Re: Hybrid scoring lexical / vector

Reply via email to