It seems to me that another way to write the formula -- borrowing
Python syntax -- is:

4 * numDocs + 38 * len(uniqueTerms) + 2 * sum([len(t) for t in uniqueTerms])

That's 4 bytes per document, plus 38 bytes per term, plus 2 bytes *
the sum of the lengths of the terms. (Numbers taken from
http://martin.nobilitas.com/java/sizeof.html)

Does that seem right?

-Charlie

On Dec 4, 2007 12:31 PM, Charles Hornberger
<[EMAIL PROTECTED]> wrote:
> > See Lucene's FieldCache.StringIndex
>
> To understand just what's getting stored for each string field, you
> may also want to look at the createValue() method of the inner Cache
> object instantiated as stringsIndexCache in FieldCacheImpl.java (line
> 399 in HEAD):
>
> http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/FieldCacheImpl.java?view=markup
>
> -Charlie
>

Reply via email to