Specifically, I’m talking about this: <statsCache class="org.apache.solr.search.stats.LRUStatsCache”/>
Adding that line increased our 95th percentile response time by 10 seconds. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 24, 2017, at 10:43 AM, Joel Bernstein <joels...@gmail.com> wrote: > > Ah, I thought you were just interested in a fast way to get at IDF. This > approach does take a callback but it's really fast. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> I know how to do it. You return df for each term and num_docs then >> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten years >> ago, though with MonkeyRank instead of global IDF. >> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ < >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/> >> >> I was wondering why Solr makes a separate request to each shard for that >> information instead of piggybacking it on the original request. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein <joels...@gmail.com> wrote: >>> >>> This may help out: >>> https://github.com/apache/lucene-solr/blob/master/solr/ >> solrj/src/java/org/apache/solr/client/solrj/io/stream/ >> ScoreNodesStream.java#L208 >>> >>> This points to some code that calculates global idf for a list of terms. >>> Not sure if this matches you use case. It seems to be very fast. >>> >>> Joel Bernstein >>> http://joelsolr.blogspot.com/ >>> >>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <wun...@wunderwood.org >>> >>> wrote: >>> >>>> I tried running with the LRUStatsCache for global IDF, but the >> performance >>>> penalty was pretty big. The 95th percentile response time went from 3.4 >>>> seconds to 13 seconds. Oops. >>>> >>>> We should not need a separate call to get the tf and df stats. Those are >>>> already calculated when doing the first request. I worked on a search >>>> engine that did it that way twenty years ago. >>>> >>>> In the past, there would have been an IP obstacle, but I think that is >>>> resolved. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>> >>>> >> >>