Ah, I thought you were just interested in a fast way to get at IDF. This approach does take a callback but it's really fast.
Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <wun...@wunderwood.org> wrote: > I know how to do it. You return df for each term and num_docs then > recalculate idf. I wrote up how we did it in Ultraseek XPA about ten years > ago, though with MonkeyRank instead of global IDF. > > https://observer.wunderwood.org/2007/04/04/progressive-reranking/ < > https://observer.wunderwood.org/2007/04/04/progressive-reranking/> > > I was wondering why Solr makes a separate request to each shard for that > information instead of piggybacking it on the original request. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Jan 24, 2017, at 10:34 AM, Joel Bernstein <joels...@gmail.com> wrote: > > > > This may help out: > > https://github.com/apache/lucene-solr/blob/master/solr/ > solrj/src/java/org/apache/solr/client/solrj/io/stream/ > ScoreNodesStream.java#L208 > > > > This points to some code that calculates global idf for a list of terms. > > Not sure if this matches you use case. It seems to be very fast. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <wun...@wunderwood.org > > > > wrote: > > > >> I tried running with the LRUStatsCache for global IDF, but the > performance > >> penalty was pretty big. The 95th percentile response time went from 3.4 > >> seconds to 13 seconds. Oops. > >> > >> We should not need a separate call to get the tf and df stats. Those are > >> already calculated when doing the first request. I worked on a search > >> engine that did it that way twenty years ago. > >> > >> In the past, there would have been an IP obstacle, but I think that is > >> resolved. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >> > >> > >