RE: IDF maxDocs / numDocs

2014-03-13 Thread Markus Jelsma
Oh yes, i see what you mean. I would try SOLR-1632 and have distributed IDF, but it seems to be broken now. -Original message- > From:Steven Bower > Sent: Wednesday 12th March 2014 21:47 > To: solr-user > Subject: Re: IDF maxDocs / numDocs > > My problem is that

Re: IDF maxDocs / numDocs

2014-03-12 Thread Steven Bower
My problem is that both maxDoc() and docCount() both report documents that have been deleted in their values. Because of merging/etc.. those numbers can be different per replica (or at least that is what I'm seeing). I need a value that is consistent across replicas... I see in the comment it makes

RE: IDF maxDocs / numDocs

2014-03-12 Thread Markus Jelsma
Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in idfExplain but there's also a docCount(). We use docCount in all our custom similarities, also because it allows you to have multiple languages in one index where one is much larger than the other. The small language will