Sorry Mike, Mark, I am confused again... Yes, I need some more memory for processing ("while FieldCache is being loaded"), obviously, but it was not main subject...
With StringIndexCache, I have 10 arrays (cardinality of this field is 10) storing (int) Lucene Document ID. > Except: as Mark said, you'll also need transient memory = pointer (4 > or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded. Ok, I see it: final int[] retArray = new int[reader.maxDoc()]; String[] mterms = new String[reader.maxDoc()+1]; I can't track right now (limited in time), I think mterms is local variable and will size down to 0... So that correct formula is... weird one... if you don't want unexpected OOM or overloaded GC (WeakHashMaps...): [some heap] + [Non-Tokenized_Field_Count] x [maxdoc] x [4 bytes + 8 bytes] (for 64-bit) -Fuad > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: November-03-09 5:00 AM > To: solr-user@lucene.apache.org > Subject: Re: Lucene FieldCache memory requirements > > On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi <f...@efendi.ca> wrote: > > I believe this is correct estimate: > > > >> C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID] > >> > >> same as > >> [String1_Document_Count + ... + String10_Document_Count + ...] > >> x [4 bytes per DocumentID] > > That's right. > > Except: as Mark said, you'll also need transient memory = pointer (4 > or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded. After > it's done being loaded, this sizes down to the number of unique terms. > > But, if Lucene did the basic int packing, which really we should do, > since you only have 10 unique values, with a naive 4 bits per doc > encoding, you'd only need 1/8th the memory usage. We could do a bit > better by encoding more than one document at a time... > > Mike