Re: Configurable collectors for custom ranking

Peter Keegan Fri, 07 Mar 2014 11:12:57 -0800

Hi Joel,

Although I solved this issue with a custom CollectorFactory, I also have a
solution that uses a PostFilter and and optional ValueSource.
Could you take a look at SOLR-5831 and see if I've got this right?


Thanks,
Peter



On Mon, Dec 23, 2013 at 6:37 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Peter,
>
> You actually only need the current score being collected to be in the
> request context. So you don't need a map, you just need an object wrapper
> around a mutable float.
>
> If you have a page size of X, only the top X scores need to be held onto,
> because all the other scores wouldn't have made it into that page anyway so
> they might as well be 0. Because the QueryResultCache caches's a larger
> window then the page size you should keep enough scores so the cached
> docList is correct. But if you're only dealing with 150K of results you
> could just keep all the scores in a FloatArrayList and not worry about the
> keeping the top X scores in a priority queue.
>
> During the collect hang onto the docIds and scores and build your scaling
> info.
>
> During the finish iterate your docIds and scale the scores as you go.
>
> Set your scaled score into the object wrapper that is in the request
> context before you collect each document.
>
> When you call collect on the delegate collectors they will call the custom
> value source for each document to perform the sort. Your custom value
> source will return whatever the float value is in the request context at
> that time.
>
> If you're also going to run this postfilter when you're doing a standard
> rank by score you'll also need to send down a dummy scorer to the delegate
> collectors. Spend some time with the CollapsingQParserPlugin in trunk to
> see how the dummy scorer works.
>
> I'll be adding value source collapse criteria to the
> CollapsingQParserPlugin this week and it will have a similar interaction
> between a PostFilter and value source. So you may want to watch SOLR-5536
> to see an example of this.
>
> Joel
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Mon, Dec 23, 2013 at 4:03 PM, Peter Keegan <peterlkee...@gmail.com
> >wrote:
>
> > Hi Joel,
> >
> > Could you clarify what would be in the key,value Map added to the
> > SearchRequest context? It seems that all the docId/score tuples need to
> be
> > there, including the ones not in the 'top N ScoreDocs' PriorityQueue
> > (score=0). If so would the Map be something like:
> > "scaled_scores",Map<Integer,Float> ?
> >
> > Also, what is the reason for passing score=0 for documents that aren't in
> > the PriorityQueue? Will these docs get filtered out before a normal sort
> by
> > score?
> >
> > Thanks,
> > Peter
> >
> >
> > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <joels...@gmail.com>
> > wrote:
> >
> > > The sorting is going to happen in the lower level collectors. You need
> a
> > > value source that returns the score of the document being collected.
> > >
> > > Here is how you can make this happen:
> > >
> > > 1) Create an object in your PostFilter that simply holds the current
> > score.
> > > Place this object in the SearchRequest context map. Update object.score
> > as
> > > you pass the docs and scores to the lower collectors.
> > >
> > > 2) Create a values source that checks the SearchRequest context for the
> > > object that's holding the current score. Use this object to return the
> > > current score when called. For example if you give the value source a
> > > handle called "score" a compound function call will look like this:
> > > sum(score(), field(x))
> > >
> > > Joel
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <peterlkee...@gmail.com
> > > >wrote:
> > >
> > > > Regarding my original goal, which is to perform a math function using
> > the
> > > > scaled score and a field value, and sort on the result, how does this
> > fit
> > > > in? Must I implement another custom PostFilter with a higher cost
> than
> > > the
> > > > scale PostFilter?
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > >
> > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan <
> peterlkee...@gmail.com
> > > > >wrote:
> > > >
> > > > > Thanks very much for the guidance. I'd be happy to donate a working
> > > > > solution.
> > > > >
> > > > > Peter
> > > > >
> > > > >
> > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein <
> joels...@gmail.com
> > > > >wrote:
> > > > >
> > > > >> SOLR-5020 has the commit info, it's mainly changes to
> > > SolrIndexSearcher
> > > > I
> > > > >> believe. They might apply to 4.3.
> > > > >> I think as long you have the finish method that's all you'll need.
> > If
> > > > you
> > > > >> can get this working it would be excellent if you could donate
> back
> > > the
> > > > >> Scale PostFilter.
> > > > >>
> > > > >>
> > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan <
> > peterlkee...@gmail.com
> > > > >> >wrote:
> > > > >>
> > > > >> > This is what I was looking for, but the DelegatingCollector
> > 'finish'
> > > > >> method
> > > > >> > doesn't exist in 4.3.0 :(   Can this be patched in and are there
> > any
> > > > >> other
> > > > >> > PostFilter dependencies on 4.5?
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Peter
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein <
> > joels...@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Here is one approach to use in a postfilter
> > > > >> > >
> > > > >> > > 1) In the collect() method call score for each doc. Use the
> > scores
> > > > to
> > > > >> > > create your scaleInfo.
> > > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X
> > > > >> ScoreDocs.
> > > > >> > > 3) Don't delegate any documents to lower collectors in the
> > > collect()
> > > > >> > > method.
> > > > >> > > 4) In the finish method create a score mapping (use the hppc
> > > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their
> > > score,
> > > > >> > using
> > > > >> > > the priorityQueue created in step 2. Then iterate the bitset
> > (also
> > > > >> > created
> > > > >> > > in step 2) sending down each doc to the lower collectors,
> > > retrieving
> > > > >> and
> > > > >> > > scaling the score from the score map. If the document is not
> in
> > > the
> > > > >> score
> > > > >> > > map then send down 0.
> > > > >> > >
> > > > >> > > You'll have setup a dummy scorer to feed to lower collectors.
> > The
> > > > >> > > CollapsingQParserPlugin has an example of how to do this.
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan <
> > > > peterlkee...@gmail.com
> > > > >> > > >wrote:
> > > > >> > >
> > > > >> > > > Hi Joel,
> > > > >> > > >
> > > > >> > > > I thought about using a PostFilter, but the problem is that
> > the
> > > > >> 'scale'
> > > > >> > > > function must be done after all matching docs have been
> scored
> > > but
> > > > >> > before
> > > > >> > > > adding them to the PriorityQueue that sorts just the rows to
> > be
> > > > >> > returned.
> > > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving
> to
> > be
> > > > too
> > > > >> > slow
> > > > >> > > > when it visits every document in the index.
> > > > >> > > >
> > > > >> > > > In the Collector, I can see how to get the field values like
> > > this:
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField,
> > > > >> > > > QParser).getValues()
> > > > >> > > >
> > > > >> > > > But, 'getValueSource' needs a QParser, which isn't
> available.
> > > > >> > > > And I can't create a QParser without a SolrQueryRequest,
> which
> > > > isn't
> > > > >> > > > available.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Peter
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein <
> > > > joels...@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Peter,
> > > > >> > > > >
> > > > >> > > > > It sounds like you could achieve what you want to do in a
> > > > >> PostFilter
> > > > >> > > > rather
> > > > >> > > > > then extending the TopDocsCollector. Is there a reason
> why a
> > > > >> > PostFilter
> > > > >> > > > > won't work for you?
> > > > >> > > > >
> > > > >> > > > > Joel
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan <
> > > > >> > peterlkee...@gmail.com
> > > > >> > > > > >wrote:
> > > > >> > > > >
> > > > >> > > > > > Quick question:
> > > > >> > > > > > In the context of a custom collector, how does one get
> the
> > > > >> values
> > > > >> > of
> > > > >> > > a
> > > > >> > > > > > field of type 'ExternalFileField'?
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Peter
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan <
> > > > >> > > peterlkee...@gmail.com
> > > > >> > > > > > >wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi Joel,
> > > > >> > > > > > >
> > > > >> > > > > > > This is related to another thread on function query
> > > > matching (
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513
> > > > >> > > > > > ).
> > > > >> > > > > > > The patch in SOLR-4465 will allow me to extend
> > > > >> TopDocsCollector
> > > > >> > and
> > > > >> > > > > > perform
> > > > >> > > > > > > the 'scale' function on only the documents matching
> the
> > > main
> > > > >> > dismax
> > > > >> > > > > > query.
> > > > >> > > > > > > As you mention, it is a slightly intrusive design and
> > > > requires
> > > > >> > > that I
> > > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of
> > > > >> HitQueue),
> > > > >> > > but
> > > > >> > > > > > should
> > > > >> > > > > > > work. I think a better design would hide the PQ from
> the
> > > > >> plugin.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > > Peter
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein <
> > > > >> > joels...@gmail.com
> > > > >> > > >
> > > > >> > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > >> Hi Peter,
> > > > >> > > > > > >>
> > > > >> > > > > > >> I've been meaning to revisit configurable ranking
> > > > collectors,
> > > > >> > but
> > > > >> > > I
> > > > >> > > > > > >> haven't
> > > > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd
> > > like
> > > > to
> > > > >> > > tackle
> > > > >> > > > > > >> though.
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan <
> > > > >> > > > peterlkee...@gmail.com>
> > > > >> > > > > > >> wrote:
> > > > >> > > > > > >>
> > > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it
> appears
> > > > that
> > > > >> > there
> > > > >> > > > is
> > > > >> > > > > a
> > > > >> > > > > > >> goal
> > > > >> > > > > > >> > to be able to do custom sorting and ranking in a
> > > > >> PostFilter.
> > > > >> > So
> > > > >> > > > far,
> > > > >> > > > > > it
> > > > >> > > > > > >> > looks like only custom aggregation can be
> implemented
> > > in
> > > > >> > > > PostFilter
> > > > >> > > > > > >> (5045).
> > > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable
> > > > collector
> > > > >> > > > (4465),
> > > > >> > > > > > but
> > > > >> > > > > > >> > this patch is no longer in dev.
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Is there any other dev. being done on adding custom
> > > > sorting
> > > > >> > > (after
> > > > >> > > > > > >> > collection) via a plugin?
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Thanks,
> > > > >> > > > > > >> > Peter
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >>
> > > > >> > > > > > >> --
> > > > >> > > > > > >> Joel Bernstein
> > > > >> > > > > > >> Search Engineer at Heliosearch
> > > > >> > > > > > >>
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Joel Bernstein
> > > > >> > > > > Search Engineer at Heliosearch
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Joel Bernstein
> > > > >> > > Search Engineer at Heliosearch
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Joel Bernstein
> > > > >> Search Engineer at Heliosearch
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> >
>

Re: Configurable collectors for custom ranking

Reply via email to