Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-02 Thread Michael McCandless
On Thu, Dec 2, 2010 at 4:31 PM, Burton-West, Tom wrote: > We turned on infostream.   Is there documentation about how to interpret it, > or should I just grep through the codebase? There isn't any documentation... and it changes over time as we add new diagnostics. > Is the excerpt below what

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-02 Thread Yonik Seeley
On Wed, Dec 1, 2010 at 3:01 PM, Shawn Heisey wrote: > I have seen this.  In Solr 1.4.1, the .fdt, .fdx, and the .tv* files do not > segment, but all the other files do.  I can't remember whether it behaves > the same under 3.1, or whether it also creates these files in each segment. Yep, that's t

RE: ramBufferSizeMB not reflected in segment sizes in index

2010-12-02 Thread Burton-West, Tom
-user@lucene.apache.org Subject: Re: ramBufferSizeMB not reflected in segment sizes in index On Wed, Dec 1, 2010 at 3:16 PM, Burton-West, Tom wrote: > Thanks Mike, > > Yes we have many unique terms due to dirty OCR and 400 languages and probably > lots of low doc freq terms as well (altho

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Michael McCandless
On Wed, Dec 1, 2010 at 3:16 PM, Burton-West, Tom wrote: > Thanks Mike, > > Yes we have many unique terms due to dirty OCR and 400 languages and probably > lots of low doc freq terms as well (although with the ICUTokenizer and > ICUFoldingFilter we should get fewer terms due to bad tokenization a

RE: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Burton-West, Tom
n the production indexer. If it doesn't I'll turn it on and post here. Tom -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, December 01, 2010 2:43 PM To: solr-user@lucene.apache.org Subject: Re: ramBufferSizeMB not reflected in s

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Shawn Heisey
On 12/1/2010 12:13 PM, Burton-West, Tom wrote: We have set the ramBufferSizeMB to 320 in both the indexDefaults and the mainIndex sections of our solrconfig.xml: 320 20 We expected that this would mean that the index would not write to disk until it reached somewhere approximately over 300MB

Re: ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Michael McCandless
The ram efficiency (= size of segment once flushed divided by size of RAM buffer) can vary drastically. Because the in-RAM data structures must be "growable" (to append new docs to the postings as they are encountered), the efficiency is never 100%. I think 50% is actually a "good" ram efficiency

ramBufferSizeMB not reflected in segment sizes in index

2010-12-01 Thread Burton-West, Tom
We are using a recent Solr 3.x (See below for exact version). We have set the ramBufferSizeMB to 320 in both the indexDefaults and the mainIndex sections of our solrconfig.xml: 320 20 We expected that this would mean that the index would not write to disk until it reached somewhere approximate