Re: Single call for distributed IDF?

Joel Bernstein Tue, 24 Jan 2017 12:29:22 -0800

Ok my mistake, I was thinking you were writing your own component and
needed a fast way to get global IDF. You're looking for fast global IDF
during the scoring it sounds like. That seems like a reasonable thing to
want.


In the piggy backing approach you mention does the aggregator node parse
the query and fetch the IDF, then pass it along to the shards?



Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jan 24, 2017 at 2:01 PM, Walter Underwood <[email protected]>
wrote:

> Specifically, I’m talking about this:
>
>     <statsCache class="org.apache.solr.search.stats.LRUStatsCache”/>
>
> Adding that line increased our 95th percentile response time by 10 seconds.
>
> wunder
> Walter Underwood
> [email protected]
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 24, 2017, at 10:43 AM, Joel Bernstein <[email protected]> wrote:
> >
> > Ah, I thought you were just interested in a fast way to get at IDF. This
> > approach does take a callback but it's really fast.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <[email protected]
> >
> > wrote:
> >
> >> I know how to do it. You return df for each term and num_docs then
> >> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten
> years
> >> ago, though with MonkeyRank instead of global IDF.
> >>
> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
> >>
> >> I was wondering why Solr makes a separate request to each shard for that
> >> information instead of piggybacking it on the original request.
> >>
> >> wunder
> >> Walter Underwood
> >> [email protected]
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein <[email protected]>
> wrote:
> >>>
> >>> This may help out:
> >>> https://github.com/apache/lucene-solr/blob/master/solr/
> >> solrj/src/java/org/apache/solr/client/solrj/io/stream/
> >> ScoreNodesStream.java#L208
> >>>
> >>> This points to some code that calculates global idf for a list of
> terms.
> >>> Not sure if this matches you use case. It seems to be very fast.
> >>>
> >>> Joel Bernstein
> >>> http://joelsolr.blogspot.com/
> >>>
> >>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <
> [email protected]
> >>>
> >>> wrote:
> >>>
> >>>> I tried running with the LRUStatsCache for global IDF, but the
> >> performance
> >>>> penalty was pretty big. The 95th percentile response time went from
> 3.4
> >>>> seconds to 13 seconds. Oops.
> >>>>
> >>>> We should not need a separate call to get the tf and df stats. Those
> are
> >>>> already calculated when doing the first request. I worked on a search
> >>>> engine that did it that way twenty years ago.
> >>>>
> >>>> In the past, there would have been an IP obstacle, but I think that is
> >>>> resolved.
> >>>>
> >>>> wunder
> >>>> Walter Underwood
> >>>> [email protected]
> >>>> http://observer.wunderwood.org/  (my blog)
> >>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Single call for distributed IDF?

Reply via email to