Hi Joel, Although I solved this issue with a custom CollectorFactory, I also have a solution that uses a PostFilter and and optional ValueSource. Could you take a look at SOLR-5831 and see if I've got this right?
Thanks, Peter On Mon, Dec 23, 2013 at 6:37 PM, Joel Bernstein <joels...@gmail.com> wrote: > Peter, > > You actually only need the current score being collected to be in the > request context. So you don't need a map, you just need an object wrapper > around a mutable float. > > If you have a page size of X, only the top X scores need to be held onto, > because all the other scores wouldn't have made it into that page anyway so > they might as well be 0. Because the QueryResultCache caches's a larger > window then the page size you should keep enough scores so the cached > docList is correct. But if you're only dealing with 150K of results you > could just keep all the scores in a FloatArrayList and not worry about the > keeping the top X scores in a priority queue. > > During the collect hang onto the docIds and scores and build your scaling > info. > > During the finish iterate your docIds and scale the scores as you go. > > Set your scaled score into the object wrapper that is in the request > context before you collect each document. > > When you call collect on the delegate collectors they will call the custom > value source for each document to perform the sort. Your custom value > source will return whatever the float value is in the request context at > that time. > > If you're also going to run this postfilter when you're doing a standard > rank by score you'll also need to send down a dummy scorer to the delegate > collectors. Spend some time with the CollapsingQParserPlugin in trunk to > see how the dummy scorer works. > > I'll be adding value source collapse criteria to the > CollapsingQParserPlugin this week and it will have a similar interaction > between a PostFilter and value source. So you may want to watch SOLR-5536 > to see an example of this. > > Joel > > > > > > > > > > > > > Joel Bernstein > Search Engineer at Heliosearch > > > On Mon, Dec 23, 2013 at 4:03 PM, Peter Keegan <peterlkee...@gmail.com > >wrote: > > > Hi Joel, > > > > Could you clarify what would be in the key,value Map added to the > > SearchRequest context? It seems that all the docId/score tuples need to > be > > there, including the ones not in the 'top N ScoreDocs' PriorityQueue > > (score=0). If so would the Map be something like: > > "scaled_scores",Map<Integer,Float> ? > > > > Also, what is the reason for passing score=0 for documents that aren't in > > the PriorityQueue? Will these docs get filtered out before a normal sort > by > > score? > > > > Thanks, > > Peter > > > > > > On Thu, Dec 12, 2013 at 11:13 AM, Joel Bernstein <joels...@gmail.com> > > wrote: > > > > > The sorting is going to happen in the lower level collectors. You need > a > > > value source that returns the score of the document being collected. > > > > > > Here is how you can make this happen: > > > > > > 1) Create an object in your PostFilter that simply holds the current > > score. > > > Place this object in the SearchRequest context map. Update object.score > > as > > > you pass the docs and scores to the lower collectors. > > > > > > 2) Create a values source that checks the SearchRequest context for the > > > object that's holding the current score. Use this object to return the > > > current score when called. For example if you give the value source a > > > handle called "score" a compound function call will look like this: > > > sum(score(), field(x)) > > > > > > Joel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Dec 12, 2013 at 9:58 AM, Peter Keegan <peterlkee...@gmail.com > > > >wrote: > > > > > > > Regarding my original goal, which is to perform a math function using > > the > > > > scaled score and a field value, and sort on the result, how does this > > fit > > > > in? Must I implement another custom PostFilter with a higher cost > than > > > the > > > > scale PostFilter? > > > > > > > > Thanks, > > > > Peter > > > > > > > > > > > > On Wed, Dec 11, 2013 at 4:01 PM, Peter Keegan < > peterlkee...@gmail.com > > > > >wrote: > > > > > > > > > Thanks very much for the guidance. I'd be happy to donate a working > > > > > solution. > > > > > > > > > > Peter > > > > > > > > > > > > > > > On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein < > joels...@gmail.com > > > > >wrote: > > > > > > > > > >> SOLR-5020 has the commit info, it's mainly changes to > > > SolrIndexSearcher > > > > I > > > > >> believe. They might apply to 4.3. > > > > >> I think as long you have the finish method that's all you'll need. > > If > > > > you > > > > >> can get this working it would be excellent if you could donate > back > > > the > > > > >> Scale PostFilter. > > > > >> > > > > >> > > > > >> On Wed, Dec 11, 2013 at 3:36 PM, Peter Keegan < > > peterlkee...@gmail.com > > > > >> >wrote: > > > > >> > > > > >> > This is what I was looking for, but the DelegatingCollector > > 'finish' > > > > >> method > > > > >> > doesn't exist in 4.3.0 :( Can this be patched in and are there > > any > > > > >> other > > > > >> > PostFilter dependencies on 4.5? > > > > >> > > > > > >> > Thanks, > > > > >> > Peter > > > > >> > > > > > >> > > > > > >> > On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein < > > joels...@gmail.com > > > > > > > > >> > wrote: > > > > >> > > > > > >> > > Here is one approach to use in a postfilter > > > > >> > > > > > > >> > > 1) In the collect() method call score for each doc. Use the > > scores > > > > to > > > > >> > > create your scaleInfo. > > > > >> > > 2) Keep a bitset of the hits and a priorityQueue of your top X > > > > >> ScoreDocs. > > > > >> > > 3) Don't delegate any documents to lower collectors in the > > > collect() > > > > >> > > method. > > > > >> > > 4) In the finish method create a score mapping (use the hppc > > > > >> > > IntFloatOpenHashMap) with your top X docIds pointing to their > > > score, > > > > >> > using > > > > >> > > the priorityQueue created in step 2. Then iterate the bitset > > (also > > > > >> > created > > > > >> > > in step 2) sending down each doc to the lower collectors, > > > retrieving > > > > >> and > > > > >> > > scaling the score from the score map. If the document is not > in > > > the > > > > >> score > > > > >> > > map then send down 0. > > > > >> > > > > > > >> > > You'll have setup a dummy scorer to feed to lower collectors. > > The > > > > >> > > CollapsingQParserPlugin has an example of how to do this. > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > On Wed, Dec 11, 2013 at 2:05 PM, Peter Keegan < > > > > peterlkee...@gmail.com > > > > >> > > >wrote: > > > > >> > > > > > > >> > > > Hi Joel, > > > > >> > > > > > > > >> > > > I thought about using a PostFilter, but the problem is that > > the > > > > >> 'scale' > > > > >> > > > function must be done after all matching docs have been > scored > > > but > > > > >> > before > > > > >> > > > adding them to the PriorityQueue that sorts just the rows to > > be > > > > >> > returned. > > > > >> > > > Doing the 'scale' function wrapped in a 'query' is proving > to > > be > > > > too > > > > >> > slow > > > > >> > > > when it visits every document in the index. > > > > >> > > > > > > > >> > > > In the Collector, I can see how to get the field values like > > > this: > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > indexSearcher.getSchema().getField("field(myfield").getType().getValueSource(SchemaField, > > > > >> > > > QParser).getValues() > > > > >> > > > > > > > >> > > > But, 'getValueSource' needs a QParser, which isn't > available. > > > > >> > > > And I can't create a QParser without a SolrQueryRequest, > which > > > > isn't > > > > >> > > > available. > > > > >> > > > > > > > >> > > > Thanks, > > > > >> > > > Peter > > > > >> > > > > > > > >> > > > > > > > >> > > > On Wed, Dec 11, 2013 at 1:48 PM, Joel Bernstein < > > > > joels...@gmail.com > > > > >> > > > > > >> > > > wrote: > > > > >> > > > > > > > >> > > > > Peter, > > > > >> > > > > > > > > >> > > > > It sounds like you could achieve what you want to do in a > > > > >> PostFilter > > > > >> > > > rather > > > > >> > > > > then extending the TopDocsCollector. Is there a reason > why a > > > > >> > PostFilter > > > > >> > > > > won't work for you? > > > > >> > > > > > > > > >> > > > > Joel > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan < > > > > >> > peterlkee...@gmail.com > > > > >> > > > > >wrote: > > > > >> > > > > > > > > >> > > > > > Quick question: > > > > >> > > > > > In the context of a custom collector, how does one get > the > > > > >> values > > > > >> > of > > > > >> > > a > > > > >> > > > > > field of type 'ExternalFileField'? > > > > >> > > > > > > > > > >> > > > > > Thanks, > > > > >> > > > > > Peter > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan < > > > > >> > > peterlkee...@gmail.com > > > > >> > > > > > >wrote: > > > > >> > > > > > > > > > >> > > > > > > Hi Joel, > > > > >> > > > > > > > > > > >> > > > > > > This is related to another thread on function query > > > > matching ( > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513 > > > > >> > > > > > ). > > > > >> > > > > > > The patch in SOLR-4465 will allow me to extend > > > > >> TopDocsCollector > > > > >> > and > > > > >> > > > > > perform > > > > >> > > > > > > the 'scale' function on only the documents matching > the > > > main > > > > >> > dismax > > > > >> > > > > > query. > > > > >> > > > > > > As you mention, it is a slightly intrusive design and > > > > requires > > > > >> > > that I > > > > >> > > > > > > manage my own PriorityQueue (and a local duplicate of > > > > >> HitQueue), > > > > >> > > but > > > > >> > > > > > should > > > > >> > > > > > > work. I think a better design would hide the PQ from > the > > > > >> plugin. > > > > >> > > > > > > > > > > >> > > > > > > Thanks, > > > > >> > > > > > > Peter > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > On Sun, Dec 8, 2013 at 5:32 PM, Joel Bernstein < > > > > >> > joels...@gmail.com > > > > >> > > > > > > > >> > > > > > wrote: > > > > >> > > > > > > > > > > >> > > > > > >> Hi Peter, > > > > >> > > > > > >> > > > > >> > > > > > >> I've been meaning to revisit configurable ranking > > > > collectors, > > > > >> > but > > > > >> > > I > > > > >> > > > > > >> haven't > > > > >> > > > > > >> yet had a chance. It's on the shortlist of things I'd > > > like > > > > to > > > > >> > > tackle > > > > >> > > > > > >> though. > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan < > > > > >> > > > peterlkee...@gmail.com> > > > > >> > > > > > >> wrote: > > > > >> > > > > > >> > > > > >> > > > > > >> > I looked at SOLR-4465 and SOLR-5045, where it > appears > > > > that > > > > >> > there > > > > >> > > > is > > > > >> > > > > a > > > > >> > > > > > >> goal > > > > >> > > > > > >> > to be able to do custom sorting and ranking in a > > > > >> PostFilter. > > > > >> > So > > > > >> > > > far, > > > > >> > > > > > it > > > > >> > > > > > >> > looks like only custom aggregation can be > implemented > > > in > > > > >> > > > PostFilter > > > > >> > > > > > >> (5045). > > > > >> > > > > > >> > Custom sorting/ranking can be done in a pluggable > > > > collector > > > > >> > > > (4465), > > > > >> > > > > > but > > > > >> > > > > > >> > this patch is no longer in dev. > > > > >> > > > > > >> > > > > > >> > > > > > >> > Is there any other dev. being done on adding custom > > > > sorting > > > > >> > > (after > > > > >> > > > > > >> > collection) via a plugin? > > > > >> > > > > > >> > > > > > >> > > > > > >> > Thanks, > > > > >> > > > > > >> > Peter > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> > > > > >> > > > > > >> -- > > > > >> > > > > > >> Joel Bernstein > > > > >> > > > > > >> Search Engineer at Heliosearch > > > > >> > > > > > >> > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > -- > > > > >> > > > > Joel Bernstein > > > > >> > > > > Search Engineer at Heliosearch > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > -- > > > > >> > > Joel Bernstein > > > > >> > > Search Engineer at Heliosearch > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> Joel Bernstein > > > > >> Search Engineer at Heliosearch > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Joel Bernstein > > > Search Engineer at Heliosearch > > > > > >