Ah, I thought you were just interested in a fast way to get at IDF. This
approach does take a callback but it's really fast.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> I know how to do it. You return df for each term and num_docs then
> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten years
> ago, though with MonkeyRank instead of global IDF.
>
> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
>
> I was wondering why Solr makes a separate request to each shard for that
> information instead of piggybacking it on the original request.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 24, 2017, at 10:34 AM, Joel Bernstein <joels...@gmail.com> wrote:
> >
> > This may help out:
> > https://github.com/apache/lucene-solr/blob/master/solr/
> solrj/src/java/org/apache/solr/client/solrj/io/stream/
> ScoreNodesStream.java#L208
> >
> > This points to some code that calculates global idf for a list of terms.
> > Not sure if this matches you use case. It seems to be very fast.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <wun...@wunderwood.org
> >
> > wrote:
> >
> >> I tried running with the LRUStatsCache for global IDF, but the
> performance
> >> penalty was pretty big. The 95th percentile response time went from 3.4
> >> seconds to 13 seconds. Oops.
> >>
> >> We should not need a separate call to get the tf and df stats. Those are
> >> already calculated when doing the first request. I worked on a search
> >> engine that did it that way twenty years ago.
> >>
> >> In the past, there would have been an IP obstacle, but I think that is
> >> resolved.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>
>
>

Reply via email to