Reading your blogs now. Joel Bernstein http://joelsolr.blogspot.com/
On Tue, Jan 24, 2017 at 3:28 PM, Joel Bernstein <joels...@gmail.com> wrote: > Ok my mistake, I was thinking you were writing your own component and > needed a fast way to get global IDF. You're looking for fast global IDF > during the scoring it sounds like. That seems like a reasonable thing to > want. > > In the piggy backing approach you mention does the aggregator node parse > the query and fetch the IDF, then pass it along to the shards? > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Tue, Jan 24, 2017 at 2:01 PM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> Specifically, I’m talking about this: >> >> <statsCache class="org.apache.solr.search.stats.LRUStatsCache”/> >> >> Adding that line increased our 95th percentile response time by 10 >> seconds. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >> >> > On Jan 24, 2017, at 10:43 AM, Joel Bernstein <joels...@gmail.com> >> wrote: >> > >> > Ah, I thought you were just interested in a fast way to get at IDF. This >> > approach does take a callback but it's really fast. >> > >> > Joel Bernstein >> > http://joelsolr.blogspot.com/ >> > >> > On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood < >> wun...@wunderwood.org> >> > wrote: >> > >> >> I know how to do it. You return df for each term and num_docs then >> >> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten >> years >> >> ago, though with MonkeyRank instead of global IDF. >> >> >> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ < >> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/> >> >> >> >> I was wondering why Solr makes a separate request to each shard for >> that >> >> information instead of piggybacking it on the original request. >> >> >> >> wunder >> >> Walter Underwood >> >> wun...@wunderwood.org >> >> http://observer.wunderwood.org/ (my blog) >> >> >> >> >> >>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein <joels...@gmail.com> >> wrote: >> >>> >> >>> This may help out: >> >>> https://github.com/apache/lucene-solr/blob/master/solr/ >> >> solrj/src/java/org/apache/solr/client/solrj/io/stream/ >> >> ScoreNodesStream.java#L208 >> >>> >> >>> This points to some code that calculates global idf for a list of >> terms. >> >>> Not sure if this matches you use case. It seems to be very fast. >> >>> >> >>> Joel Bernstein >> >>> http://joelsolr.blogspot.com/ >> >>> >> >>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood < >> wun...@wunderwood.org >> >>> >> >>> wrote: >> >>> >> >>>> I tried running with the LRUStatsCache for global IDF, but the >> >> performance >> >>>> penalty was pretty big. The 95th percentile response time went from >> 3.4 >> >>>> seconds to 13 seconds. Oops. >> >>>> >> >>>> We should not need a separate call to get the tf and df stats. Those >> are >> >>>> already calculated when doing the first request. I worked on a search >> >>>> engine that did it that way twenty years ago. >> >>>> >> >>>> In the past, there would have been an IP obstacle, but I think that >> is >> >>>> resolved. >> >>>> >> >>>> wunder >> >>>> Walter Underwood >> >>>> wun...@wunderwood.org >> >>>> http://observer.wunderwood.org/ (my blog) >> >>>> >> >>>> >> >>>> >> >> >> >> >> >> >