Re: out of heap space, every day

2007-12-04 Thread Charles Hornberger
It seems to me that another way to write the formula -- borrowing Python syntax -- is: 4 * numDocs + 38 * len(uniqueTerms) + 2 * sum([len(t) for t in uniqueTerms]) That's 4 bytes per document, plus 38 bytes per term, plus 2 bytes * the sum of the lengths of the terms. (Numbers taken from http://m

Re: out of heap space, every day

2007-12-04 Thread Charles Hornberger
> See Lucene's FieldCache.StringIndex To understand just what's getting stored for each string field, you may also want to look at the createValue() method of the inner Cache object instantiated as stringsIndexCache in FieldCacheImpl.java (line 399 in HEAD): http://svn.apache.org/viewvc/lucene/ja

Re: out of heap space, every day

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 3:11 PM, Norskog, Lance <[EMAIL PROTECTED]> wrote: > "String[nTerms()]": Does this mean that you compare the first term, then > the second, etc.? Otherwise I don't understand how to compare multiple > terms in two records. Lucene sorting only supports a single term per document for

RE: out of heap space, every day

2007-12-04 Thread Norskog, Lance
y Sent: Tuesday, December 04, 2007 8:06 AM To: solr-user@lucene.apache.org Subject: Re: out of heap space, every day On Dec 4, 2007 10:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > > > > For faceting and sorting, yes. For normal search, no. > > > > Interesting you m

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. Then double that to allow for a warming searcher. This is great, but can you help me parse this? Assume 8M docs and I'm sorting on an int field that is unix time (seonds since epoch.) For the purposes of the experiment assume eve

RE: out of heap space, every day

2007-12-04 Thread Norskog, Lance
D] On Behalf Of Yonik Seeley Sent: Tuesday, December 04, 2007 8:06 AM To: solr-user@lucene.apache.org Subject: Re: out of heap space, every day On Dec 4, 2007 10:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > > > > For faceting and sorting, yes. For normal search, no. > > >

Re: out of heap space, every day

2007-12-04 Thread Mike Klaas
On 4-Dec-07, at 8:10 AM, Brian Carmalt wrote: Hello, I am also fighting with heap exhaustion, however during the indexing step. I was able to minimize, but not fix the problem by setting the thread stack size to 64k with "-Xss64k". The minimum size is os specific, but the VM will tell you

Re: out of heap space, every day

2007-12-04 Thread Brian Carmalt
Hello, I am also fighting with heap exhaustion, however during the indexing step. I was able to minimize, but not fix the problem by setting the thread stack size to 64k with "-Xss64k". The minimum size is os specific, but the VM will tell you if you set the size too small. You can try it, it

Re: out of heap space, every day

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 10:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > > > > For faceting and sorting, yes. For normal search, no. > > > > Interesting you mention that, because one of the other changes since > last week besides the index growing is that we added a sort to an > sint field on the queri

Re: out of heap space, every day

2007-12-04 Thread Brian Whitman
For faceting and sorting, yes. For normal search, no. Interesting you mention that, because one of the other changes since last week besides the index growing is that we added a sort to an sint field on the queries. Is it reasonable that a sint sort would require over 2.5GB of heap on

Re: out of heap space, every day

2007-12-04 Thread Yonik Seeley
On Dec 4, 2007 10:46 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > Are there 'native' memory requirements for solr as a function of > index size? For faceting and sorting, yes. For normal search, no. -Yonik