Re: Configurable collectors for custom ranking

2014-03-07 Thread Peter Keegan
Hi Joel, Although I solved this issue with a custom CollectorFactory, I also have a solution that uses a PostFilter and and optional ValueSource. Could you take a look at SOLR-5831 and see if I've got this right? Thanks, Peter On Mon, Dec 23, 2013 at 6:37 PM, Joel Bernstein wrote: > Peter, >

Re: Configurable collectors for custom ranking

2013-12-26 Thread Peter Keegan
In my case, the final function call looks something like this: sum(product($k1,score()),product($k2,field(x))) This means that all the scores would have to scaled and passed down, not just the top N because even a low score could be offset by a high value in 'field(x)'. Thanks, Peter On Mon, Dec

Re: Configurable collectors for custom ranking

2013-12-23 Thread Joel Bernstein
Peter, You actually only need the current score being collected to be in the request context. So you don't need a map, you just need an object wrapper around a mutable float. If you have a page size of X, only the top X scores need to be held onto, because all the other scores wouldn't have made

Re: Configurable collectors for custom ranking

2013-12-23 Thread Peter Keegan
Hi Joel, Could you clarify what would be in the key,value Map added to the SearchRequest context? It seems that all the docId/score tuples need to be there, including the ones not in the 'top N ScoreDocs' PriorityQueue (score=0). If so would the Map be something like: "scaled_scores",Map ? Also,

Re: Configurable collectors for custom ranking

2013-12-21 Thread Joel Bernstein
Hi Peter, The fastest approach to doing this would to keep parallel hppc FloatArrayList for the scores and IntArrayList for the docs. Just add the docs and scores at collect time and iterate them in the finish. You'll be using more memory, but if you're looking for best possible performance then t

Re: Configurable collectors for custom ranking

2013-12-19 Thread Peter Keegan
I implemented the PostFilter approach described by Joel. Just iterating over the OpenBitSet, even without the scaling or the HashMap lookup, added 30ms to a query time, which kinda surprised me. There were about 150K hits out of a total of 500K. Is OpenBitSet the best way to do this? Thanks, Peter

Re: Configurable collectors for custom ranking

2013-12-19 Thread Peter Keegan
In order to size the PriorityQueue, the result window size for the query is needed. This has been computed in the SolrIndexSearcher and available in: QueryCommand.getSupersetMaxDoc(), but doesn't seem to be available for the PostFilter in either the SolrParms or SolrQueryRequest. Is there a way to

Re: Configurable collectors for custom ranking

2013-12-12 Thread Joel Bernstein
Thanks, I agree this powerful stuff. One of the reasons that I haven't gotten back to pluggable collectors is that I've been using PostFilters instead. When you start doing stuff with scores in postfilters you'll run into the bug in SOLR-5416. This will effect you when you use facets in combinatio

Re: Configurable collectors for custom ranking

2013-12-12 Thread Peter Keegan
This is pretty cool, and worthy of adding to Solr in Action (v2) and the other books. With function queries, flexible filter processing and caching, custom collectors, and post filters, there's a lot of flexibility here. Btw, the query times using a custom collector to scale/recompute scores is ex

Re: Configurable collectors for custom ranking

2013-12-12 Thread Joel Bernstein
The sorting is going to happen in the lower level collectors. You need a value source that returns the score of the document being collected. Here is how you can make this happen: 1) Create an object in your PostFilter that simply holds the current score. Place this object in the SearchRequest co

Re: Configurable collectors for custom ranking

2013-12-12 Thread Peter Keegan
Regarding my original goal, which is to perform a math function using the scaled score and a field value, and sort on the result, how does this fit in? Must I implement another custom PostFilter with a higher cost than the scale PostFilter? Thanks, Peter On Wed, Dec 11, 2013 at 4:01 PM, Peter Ke

Re: Configurable collectors for custom ranking

2013-12-11 Thread Peter Keegan
Thanks very much for the guidance. I'd be happy to donate a working solution. Peter On Wed, Dec 11, 2013 at 3:53 PM, Joel Bernstein wrote: > SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I > believe. They might apply to 4.3. > I think as long you have the finish metho

Re: Configurable collectors for custom ranking

2013-12-11 Thread Joel Bernstein
SOLR-5020 has the commit info, it's mainly changes to SolrIndexSearcher I believe. They might apply to 4.3. I think as long you have the finish method that's all you'll need. If you can get this working it would be excellent if you could donate back the Scale PostFilter. On Wed, Dec 11, 2013 at 3

Re: Configurable collectors for custom ranking

2013-12-11 Thread Peter Keegan
This is what I was looking for, but the DelegatingCollector 'finish' method doesn't exist in 4.3.0 :( Can this be patched in and are there any other PostFilter dependencies on 4.5? Thanks, Peter On Wed, Dec 11, 2013 at 3:16 PM, Joel Bernstein wrote: > Here is one approach to use in a postfil

Re: Configurable collectors for custom ranking

2013-12-11 Thread Joel Bernstein
Here is one approach to use in a postfilter 1) In the collect() method call score for each doc. Use the scores to create your scaleInfo. 2) Keep a bitset of the hits and a priorityQueue of your top X ScoreDocs. 3) Don't delegate any documents to lower collectors in the collect() method. 4) In the

Re: Configurable collectors for custom ranking

2013-12-11 Thread Peter Keegan
>From the Collector context, I suppose I can access the FileFloatSource directly like this, although it's not generic: SchemaField field = indexSearcher.getSchema().getField(fieldName); dataDir = indexSearcher.getSchema().getResourceLoader().getDataDir(); ExternalFileField eff = (ExternalFileField

Re: Configurable collectors for custom ranking

2013-12-11 Thread Peter Keegan
Hi Joel, I thought about using a PostFilter, but the problem is that the 'scale' function must be done after all matching docs have been scored but before adding them to the PriorityQueue that sorts just the rows to be returned. Doing the 'scale' function wrapped in a 'query' is proving to be too

Re: Configurable collectors for custom ranking

2013-12-11 Thread Joel Bernstein
Peter, It sounds like you could achieve what you want to do in a PostFilter rather then extending the TopDocsCollector. Is there a reason why a PostFilter won't work for you? Joel On Tue, Dec 10, 2013 at 3:24 PM, Peter Keegan wrote: > Quick question: > In the context of a custom collector, how

Re: Configurable collectors for custom ranking

2013-12-10 Thread Peter Keegan
Quick question: In the context of a custom collector, how does one get the values of a field of type 'ExternalFileField'? Thanks, Peter On Tue, Dec 10, 2013 at 1:18 PM, Peter Keegan wrote: > Hi Joel, > > This is related to another thread on function query matching ( > http://lucene.472066.n3.na

Re: Configurable collectors for custom ranking

2013-12-10 Thread Peter Keegan
Hi Joel, This is related to another thread on function query matching ( http://lucene.472066.n3.nabble.com/Function-query-matching-td4099807.html#a4105513). The patch in SOLR-4465 will allow me to extend TopDocsCollector and perform the 'scale' function on only the documents matching the main dism

Re: Configurable collectors for custom ranking

2013-12-08 Thread Joel Bernstein
Hi Peter, I've been meaning to revisit configurable ranking collectors, but I haven't yet had a chance. It's on the shortlist of things I'd like to tackle though. On Fri, Dec 6, 2013 at 4:17 PM, Peter Keegan wrote: > I looked at SOLR-4465 and SOLR-5045, where it appears that there is a goal >