Hi Walter,

    Thank you for your help. I think you are right, the most important
issue here is "the most selective terms are rare". So I probably still need
to implement distributed IDF to get better results.

On Fri, Aug 31, 2012 at 8:36 AM, Walter Underwood <wun...@wunderwood.org>wrote:

> That is true if you randomly distribute the documents. If they are
> distributed according to topic, there can be some big anomalies.
>
> Also, the DFs for rare terms will have bigger errors. There is some
> statistical theorem about this, but I can't remember it right now. Thanks
> to Zipf, most of your terms are rare. Also, the most selective terms are
> rare.
>
> wunder
>
> On Aug 30, 2012, at 5:25 PM, Lance Norskog wrote:
>
> > The math for "confidence values" in probability theory shows that
> > distributed DF does not matter after not very many documents. If you
> > have 10s of thousands of documents in each shard, don't worry.
> >
> > On Thu, Aug 30, 2012 at 1:19 PM, Steven A Rowe <sar...@syr.edu> wrote:
> >> Hi Ke,
> >>
> >> Have you seen <https://issues.apache.org/jira/browse/SOLR-1632>?
> >>
> >> Steve
> >>
> >> -----Original Message-----
> >> From: Eric Wu [mailto:eirik...@gmail.com]
> >> Sent: Thursday, August 30, 2012 3:05 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Solr4 distributed IDF
> >>
> >> Hi there,
> >>
> >> Does there exist any issue ticket about the distributed IDF feature in
> >> solr4? Or maybe there already have some patches that I can use? Thank
> you
> >> very much.
> >>
> >> --
> >> Ke Wu,
> >> Best Regards
>
>
>
>
>


-- 
Ke Wu,
Best Regards

Reply via email to