Re: Single call for distributed IDF?

Joel Bernstein Tue, 24 Jan 2017 12:30:39 -0800

Reading your blogs now.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Jan 24, 2017 at 3:28 PM, Joel Bernstein <joels...@gmail.com> wrote:

> Ok my mistake, I was thinking you were writing your own component and
> needed a fast way to get global IDF. You're looking for fast global IDF
> during the scoring it sounds like. That seems like a reasonable thing to
> want.
>
> In the piggy backing approach you mention does the aggregator node parse
> the query and fetch the IDF, then pass it along to the shards?
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, Jan 24, 2017 at 2:01 PM, Walter Underwood <wun...@wunderwood.org>
> wrote:
>
>> Specifically, I’m talking about this:
>>
>>     <statsCache class="org.apache.solr.search.stats.LRUStatsCache”/>
>>
>> Adding that line increased our 95th percentile response time by 10
>> seconds.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> > On Jan 24, 2017, at 10:43 AM, Joel Bernstein <joels...@gmail.com>
>> wrote:
>> >
>> > Ah, I thought you were just interested in a fast way to get at IDF. This
>> > approach does take a callback but it's really fast.
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Tue, Jan 24, 2017 at 1:39 PM, Walter Underwood <
>> wun...@wunderwood.org>
>> > wrote:
>> >
>> >> I know how to do it. You return df for each term and num_docs then
>> >> recalculate idf. I wrote up how we did it in Ultraseek XPA about ten
>> years
>> >> ago, though with MonkeyRank instead of global IDF.
>> >>
>> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/ <
>> >> https://observer.wunderwood.org/2007/04/04/progressive-reranking/>
>> >>
>> >> I was wondering why Solr makes a separate request to each shard for
>> that
>> >> information instead of piggybacking it on the original request.
>> >>
>> >> wunder
>> >> Walter Underwood
>> >> wun...@wunderwood.org
>> >> http://observer.wunderwood.org/  (my blog)
>> >>
>> >>
>> >>> On Jan 24, 2017, at 10:34 AM, Joel Bernstein <joels...@gmail.com>
>> wrote:
>> >>>
>> >>> This may help out:
>> >>> https://github.com/apache/lucene-solr/blob/master/solr/
>> >> solrj/src/java/org/apache/solr/client/solrj/io/stream/
>> >> ScoreNodesStream.java#L208
>> >>>
>> >>> This points to some code that calculates global idf for a list of
>> terms.
>> >>> Not sure if this matches you use case. It seems to be very fast.
>> >>>
>> >>> Joel Bernstein
>> >>> http://joelsolr.blogspot.com/
>> >>>
>> >>> On Tue, Jan 24, 2017 at 1:09 PM, Walter Underwood <
>> wun...@wunderwood.org
>> >>>
>> >>> wrote:
>> >>>
>> >>>> I tried running with the LRUStatsCache for global IDF, but the
>> >> performance
>> >>>> penalty was pretty big. The 95th percentile response time went from
>> 3.4
>> >>>> seconds to 13 seconds. Oops.
>> >>>>
>> >>>> We should not need a separate call to get the tf and df stats. Those
>> are
>> >>>> already calculated when doing the first request. I worked on a search
>> >>>> engine that did it that way twenty years ago.
>> >>>>
>> >>>> In the past, there would have been an IP obstacle, but I think that
>> is
>> >>>> resolved.
>> >>>>
>> >>>> wunder
>> >>>> Walter Underwood
>> >>>> wun...@wunderwood.org
>> >>>> http://observer.wunderwood.org/  (my blog)
>> >>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Re: Single call for distributed IDF?

Reply via email to