Thank you very much Mike, I found it: org.apache.solr.request.SimpleFacets ... // TODO: future logic could use filters instead of the fieldcache if // the number of terms in the field is small enough. counts = getFieldCacheCounts(searcher, base, field, offset,limit, mincount, missing, sort, prefix); ... FieldCache.StringIndex si = FieldCache.DEFAULT.getStringIndex(searcher.getReader(), fieldName); final String[] terms = si.lookup; final int[] termNum = si.order; ...
So that 64-bit requires more memory :) Mike, am I right here? [(8 bytes pointer) + (4 bytes DocID)] x [Number of Documents (100mlns)] (64-bit JVM) 1.2Gb RAM for this... Or, may be I am wrong: > For Lucene directly, simple strings would consume an pointer (4 or 8 > bytes depending on whether your JRE is 64bit) per doc, and the string > index would consume an int (4 bytes) per doc. [8 bytes (64bit)] x [number of documents (100mlns)]? 0.8Gb Kind of Map between String and DocSet, saving 4 bytes... "Key" is String, and "Value" is array of 64-bit pointers to Document. Why 64-bit (for 64-bit JVM)? I always thought it is (int) documentId... Am I right? Thanks for pointing to http://issues.apache.org/jira/browse/LUCENE-1990! >> Note that for your use case, this is exceptionally wasteful. This is probably very common case... I think it should be confirmed by Lucene developers too... FieldCache is warmed anyway, even when we don't use SOLR... -Fuad > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: November-02-09 6:00 PM > To: solr-user@lucene.apache.org > Subject: Re: Lucene FieldCache memory requirements > > OK I think someone who knows how Solr uses the fieldCache for this > type of field will have to pipe up. > > For Lucene directly, simple strings would consume an pointer (4 or 8 > bytes depending on whether your JRE is 64bit) per doc, and the string > index would consume an int (4 bytes) per doc. (Each also consume > negligible (for your case) memory to hold the actual string values). > > Note that for your use case, this is exceptionally wasteful. If > Lucene had simple bit-packed ints (I've opened LUCENE-1990 for this) > then it'd take much fewer bits to reference the values, since you have > only 10 unique string values. > > Mike > > On Mon, Nov 2, 2009 at 3:57 PM, Fuad Efendi <f...@efendi.ca> wrote: > > I am not using Lucene API directly; I am using SOLR which uses Lucene > > FieldCache for faceting on non-tokenized fields... > > I think this cache will be lazily loaded, until user executes sorted (by > > this field) SOLR query for all documents *:* - in this case it will be fully > > populated... > > > > > >> Subject: Re: Lucene FieldCache memory requirements > >> > >> Which FieldCache API are you using? getStrings? or getStringIndex > >> (which is used, under the hood, if you sort by this field). > >> > >> Mike > >> > >> On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi <f...@efendi.ca> wrote: > >> > Any thoughts regarding the subject? I hope FieldCache doesn't use more > > than > >> > 6 bytes per document-field instance... I am too lazy to research Lucene > >> > source code, I hope someone can provide exact answer... Thanks > >> > > >> > > >> >> Subject: Lucene FieldCache memory requirements > >> >> > >> >> Hi, > >> >> > >> >> > >> >> Can anyone confirm Lucene FieldCache memory requirements? I have 100 > >> >> millions docs with non-tokenized field "country" (10 different > > countries); > >> > I > >> >> expect it requires array of ("int", "long"), size of array 100,000,000, > >> >> without any impact of "country" field length; > >> >> > >> >> it requires 600,000,000 bytes: "int" is pointer to document (Lucene > >> > document > >> >> ID), and "long" is pointer to String value... > >> >> > >> >> Am I right, is it 600Mb just for this "country" (indexed, > > non-tokenized, > >> >> non-boolean) field and 100 millions docs? I need to calculate exact > >> > minimum RAM > >> >> requirements... > >> >> > >> >> I believe it shouldn't depend on cardinality (distribution) of field... > >> >> > >> >> Thanks, > >> >> Fuad > >> >> > >> >> > >> >> > >> >> > >> > > >> > > >> > > >> > > > > > > >