It seems to me that another way to write the formula -- borrowing Python syntax -- is:
4 * numDocs + 38 * len(uniqueTerms) + 2 * sum([len(t) for t in uniqueTerms]) That's 4 bytes per document, plus 38 bytes per term, plus 2 bytes * the sum of the lengths of the terms. (Numbers taken from http://martin.nobilitas.com/java/sizeof.html) Does that seem right? -Charlie On Dec 4, 2007 12:31 PM, Charles Hornberger <[EMAIL PROTECTED]> wrote: > > See Lucene's FieldCache.StringIndex > > To understand just what's getting stored for each string field, you > may also want to look at the createValue() method of the inner Cache > object instantiated as stringsIndexCache in FieldCacheImpl.java (line > 399 in HEAD): > > http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/FieldCacheImpl.java?view=markup > > -Charlie >