This is the patent. Last assignee was Google, expired in 2017. 
https://patents.google.com/patent/US5659732A/en  —wunder

> On Aug 27, 2024, at 12:01 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> When I’ve enabled global exact IDF in Solr, the speed penalty was about 10X. 
> Back in 1995, Infoseek figured out how to do that with no speed penalty. They 
> patented it, but that patent expired several years ago. I’ll try and hunt it 
> down.
> 
> Short version, from each shard return the number of docs and the df for each 
> term. When combining results, add all the DF, add all the NUMDOCS, divide, 
> and you have the global IDF. This is constant for the whole result list. Each 
> shard already needs that info for local score, so it shouldn’t be extra work.
> 
> When does this matter? When the relevant documents for a term are mostly on 
> one shard, either intentionally or accidentally. Let’s say we have a news 
> search and all the stories for August 2024 are on one shard. The term 
> “kamala” will be much more common on that shard, giving a lower IDF, but…the 
> relevant documents are probably on that shard. So the best documents have a 
> lower score using local IDF.
> 
> This also shows up with lots of shards or small shards, because there will be 
> uneven distribution of docs. When I retired from LexisNexis, we had a cluster 
> with 320 shards. I’m sure that had some interesting IDF behavior.
> 
> I wrote up how we did this in a Java distributed search layer for Ultraseek: 
> https://observer.wunderwood.org/2007/04/04/progressive-reranking/
> 
> There is some earlier discussion here: 
> https://solr-user.lucene.apache.narkive.com/zNa1Hn4p/single-call-for-distributed-idf
> 
> I don’t think there is a Jira issue for this.
> 
> I think that is all the unfinished business since putting Solr 1.3 into 
> production at Netflix. Pretty darned good job everybody. Huge thanks to all 
> the contributors and committers who have put in years of effort over that 
> time.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 

Reply via email to