RE: Lucene FieldCache memory requirements

2009-11-03 Thread Fuad Efendi
-Fuad > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: November-03-09 5:00 AM > To: solr-user@lucene.apache.org > Subject: Re: Lucene FieldCache memory requirements > > On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi wrote: > > I b

Re: Lucene FieldCache memory requirements

2009-11-03 Thread Michael McCandless
On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi wrote: > I believe this is correct estimate: > >> C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID] >> >>   same as >> [String1_Document_Count + ... + String10_Document_Count + ...] >> x [4 bytes per DocumentID] That's right. Except: as Mark said, you

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
FieldCache uses internally WeakHashMap... nothing wrong, but... no any Garbage Collection tuning will help in case if allocated RAM is not enough for replacing Weak** with Strong**, especially for SOLR faceting... 10%-15% CPU taken by GC were reported... -Fuad

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Even in simplistic scenario, when it is Garbage Collected, we still _need_to_be_able_ to allocate enough RAM to FieldCache on demand... linear dependency on document count... > > Hi Mark, > > Yes, I understand it now; however, how will StringIndexCache size down in a > production system facetin

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
ll it size down in purely Lucene-based heavy-loaded production system? Especially if this cache is used for query optimizations. > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: November-02-09 8:53 PM > To: solr-user@lucene.apache.org > Subject: R

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
o be safe, use this in your basic memory estimates: [512Mb ~ 1Gb] + [non_tokenized_fields_count] x [maxdoc] x [8 bytes] -Fuad > -Original Message- > From: Fuad Efendi [mailto:f...@efendi.ca] > Sent: November-02-09 7:37 PM > To: solr-user@lucene.apache.org > Subject: RE: Lucene

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Mark Miller
static final class StringIndexCache extends Cache { StringIndexCache(FieldCache wrapper) { super(wrapper); } @Override protected Object createValue(IndexReader reader, Entry entryKey) throws IOException { String field = StringHelper.intern(entryKey.field);

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
To be correct, I analyzed FieldCache awhile ago and I believed it never "sizes down"... /** * Expert: The default cache implementation, storing all values in memory. * A WeakHashMap is used for storage. * * Created: May 19, 2004 4:40:36 PM * * @since lucene 1.4 */ Will it size down? Onl

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
PM > To: solr-user@lucene.apache.org > Subject: RE: Lucene FieldCache memory requirements > > Mark, > > I don't understand this: > > so with a ton of docs and a few uniques, you get a temp boost in the RAM > > reqs until it sizes it down. > > Sizes down???

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Mark, I don't understand this: > so with a ton of docs and a few uniques, you get a temp boost in the RAM > reqs until it sizes it down. Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is not cache? And this: > A pointer for each doc. Why can't we use (int) DocumentID?

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
I just did some tests in a completely new index (Slave), sort by low-distributed non-tokenized Field (such as Country) takes milliseconds, but sort (ascending) on tokenized field with heavy distribution took 30 seconds (initially). Second sort (descending) took milliseconds. Generic query *.*; Fiel

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Mark Miller
Fuad Efendi wrote: > Simple field (10 different values: Canada, USA, UK, ...), 64-bit JVM... no > difference between maxdoc and maxdoc + 1 for such estimate... difference is > between 0.4Gb and 1.2Gb... > > I'm not sure I understand - but I didn't mean to imply the +1 on maxdoc meant anything. T

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
hope it is (int) Document ID... > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: November-02-09 6:52 PM > To: solr-user@lucene.apache.org > Subject: Re: Lucene FieldCache memory requirements > > It also briefly requires more

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Mark Miller
se, this is exceptionally wasteful. >>> > This is probably very common case... I think it should be confirmed by > Lucene developers too... FieldCache is warmed anyway, even when we don't use > SOLR... > > > -Fuad > > > > > > > >

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
when we don't use SOLR... -Fuad > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: November-02-09 6:00 PM > To: solr-user@lucene.apache.org > Subject: Re: Lucene FieldCache memory requirements > > OK I think someone who knows how S

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Michael McCandless
this field) SOLR query for all documents *:* - in this case it will be fully > populated... > > >> Subject: Re: Lucene FieldCache memory requirements >> >> Which FieldCache API are you using?  getStrings?  or getStringIndex >> (which is used, under the hood, if you so

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
ect: Re: Lucene FieldCache memory requirements > > Which FieldCache API are you using? getStrings? or getStringIndex > (which is used, under the hood, if you sort by this field). > > Mike > > On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi wrote: > > Any thoughts regarding

Re: Lucene FieldCache memory requirements

2009-11-02 Thread Michael McCandless
ument-field instance... I am too lazy to research Lucene > source code, I hope someone can provide exact answer... Thanks > > >> Subject: Lucene FieldCache memory requirements >> >> Hi, >> >> >> Can anyone confirm Lucene FieldCache memory requirements? I hav

RE: Lucene FieldCache memory requirements

2009-11-02 Thread Fuad Efendi
Any thoughts regarding the subject? I hope FieldCache doesn't use more than 6 bytes per document-field instance... I am too lazy to research Lucene source code, I hope someone can provide exact answer... Thanks > Subject: Lucene FieldCache memory requirements > > Hi, > >

Lucene FieldCache memory requirements

2009-10-30 Thread Fuad Efendi
Hi, Can anyone confirm Lucene FieldCache memory requirements? I have 100 millions docs with non-tokenized field "country" (10 different countries); I expect it requires array of ("int", "long"), size of array 100,000,000, without any impact of "country"