Re: Single call for distributed IDF?

Walter Underwood Tue, 24 Jan 2017 10:41:40 -0800

I know how to do it. You return df for each term and num_docs then recalculate 
idf. I wrote up how we did it in Ultraseek XPA about ten years ago, though with 
MonkeyRank instead of global IDF.


https://observer.wunderwood.org/2007/04/04/progressive-reranking/ 
<https://observer.wunderwood.org/2007/04/04/progressive-reranking/>

I was wondering why Solr makes a separate request to each shard for that 
information instead of piggybacking it on the original request.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)


> On Jan 24, 2017, at 10:34 AM, Joel Bernstein <[email protected]> wrote:
> 
> This may help out:
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/ScoreNodesStream.java#L208
> 
> This points to some code that calculates global idf for a list of terms.
> Not sure if this matches you use case. It seems to be very fast.
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <[email protected]>
> wrote:
> 
>> I tried running with the LRUStatsCache for global IDF, but the performance
>> penalty was pretty big. The 95th percentile response time went from 3.4
>> seconds to 13 seconds. Oops.
>> 
>> We should not need a separate call to get the tf and df stats. Those are
>> already calculated when doing the first request. I worked on a search
>> engine that did it that way twenty years ago.
>> 
>> In the past, there would have been an IP obstacle, but I think that is
>> resolved.
>> 
>> wunder
>> Walter Underwood
>> [email protected]
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>

Re: Single call for distributed IDF?

Reply via email to