Hi Walter, Thank you for your help. I think you are right, the most important issue here is "the most selective terms are rare". So I probably still need to implement distributed IDF to get better results.
On Fri, Aug 31, 2012 at 8:36 AM, Walter Underwood <wun...@wunderwood.org>wrote: > That is true if you randomly distribute the documents. If they are > distributed according to topic, there can be some big anomalies. > > Also, the DFs for rare terms will have bigger errors. There is some > statistical theorem about this, but I can't remember it right now. Thanks > to Zipf, most of your terms are rare. Also, the most selective terms are > rare. > > wunder > > On Aug 30, 2012, at 5:25 PM, Lance Norskog wrote: > > > The math for "confidence values" in probability theory shows that > > distributed DF does not matter after not very many documents. If you > > have 10s of thousands of documents in each shard, don't worry. > > > > On Thu, Aug 30, 2012 at 1:19 PM, Steven A Rowe <sar...@syr.edu> wrote: > >> Hi Ke, > >> > >> Have you seen <https://issues.apache.org/jira/browse/SOLR-1632>? > >> > >> Steve > >> > >> -----Original Message----- > >> From: Eric Wu [mailto:eirik...@gmail.com] > >> Sent: Thursday, August 30, 2012 3:05 AM > >> To: solr-user@lucene.apache.org > >> Subject: Solr4 distributed IDF > >> > >> Hi there, > >> > >> Does there exist any issue ticket about the distributed IDF feature in > >> solr4? Or maybe there already have some patches that I can use? Thank > you > >> very much. > >> > >> -- > >> Ke Wu, > >> Best Regards > > > > > -- Ke Wu, Best Regards