The usual reason to do a second call to get the stats for global IDF is to get around an Infoseek patent on the single call version. But that patent finally expired a couple of years ago, so now there is no reason to do a second call.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 24, 2017, at 11:01 AM, Walter Underwood <wun...@wunderwood.org> wrote: > > Specifically, I’m talking about this: > > <statsCache class="org.apache.solr.search.stats.LRUStatsCache”/> > > Adding that line increased our 95th percentile response time by 10 seconds. > > wunder > Walter Underwood > wun...@wunderwood.org <mailto:wun...@wunderwood.org> > http://observer.wunderwood.org/ (my blog) > > >> On Jan 24, 2017, at 10:43 AM, Joel Bernstein <joels...@gmail.com >> <mailto:joels...@gmail.com>> wrote: >> >> Ah, I thought you were just interested in a fast way to get at IDF. This >> approach does take a callback but it's really fast. >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ <http://joelsolr.blogspot.com/> >> >> On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <wun...@wunderwood.org> >> wrote: >> >>> I know how to do it. You return df for each term and num_docs then >>> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten years >>> ago, though with MonkeyRank instead of global IDF. >>> >>> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ < >>> https://observer.wunderwood.org/2007/04/04/progressive-reranking/> >>> >>> I was wondering why Solr makes a separate request to each shard for that >>> information instead of piggybacking it on the original request. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> >>>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein <joels...@gmail.com> wrote: >>>> >>>> This may help out: >>>> https://github.com/apache/lucene-solr/blob/master/solr/ >>> solrj/src/java/org/apache/solr/client/solrj/io/stream/ >>> ScoreNodesStream.java#L208 >>>> >>>> This points to some code that calculates global idf for a list of terms. >>>> Not sure if this matches you use case. It seems to be very fast. >>>> >>>> Joel Bernstein >>>> http://joelsolr.blogspot.com/ >>>> >>>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <wun...@wunderwood.org >>>> >>>> wrote: >>>> >>>>> I tried running with the LRUStatsCache for global IDF, but the >>> performance >>>>> penalty was pretty big. The 95th percentile response time went from 3.4 >>>>> seconds to 13 seconds. Oops. >>>>> >>>>> We should not need a separate call to get the tf and df stats. Those are >>>>> already calculated when doing the first request. I worked on a search >>>>> engine that did it that way twenty years ago. >>>>> >>>>> In the past, there would have been an IP obstacle, but I think that is >>>>> resolved. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>> >>>>> >>> >>> >