Re: Lucene FieldCache memory requirements

Mark Miller Mon, 02 Nov 2009 17:54:04 -0800

 static final class StringIndexCache extends Cache {
    StringIndexCache(FieldCache wrapper) {
      super(wrapper);
    }


    @Override
    protected Object createValue(IndexReader reader, Entry entryKey)
        throws IOException {
      String field = StringHelper.intern(entryKey.field);
      final int[] retArray = new int[reader.maxDoc()];
      String[] mterms = new String[reader.maxDoc()+1];
      TermDocs termDocs = reader.termDocs();
      TermEnum termEnum = reader.terms (new Term (field));
      int t = 0;  // current term number

      // an entry for documents that have no terms in this field
      // should a document with no terms be at top or bottom?
      // this puts them at the top - if it is changed,
FieldDocSortedHitQueue
      // needs to change as well.
      mterms[t++] = null;

      try {
        do {
          Term term = termEnum.term();
          if (term==null || term.field() != field) break;

          // store term text
          // we expect that there is at most one term per document
          if (t >= mterms.length) throw new RuntimeException ("there are
more terms than " +
                  "documents in field \"" + field + "\", but it's
impossible to sort on " +
                  "tokenized fields");
          mterms[t] = term.text();

          termDocs.seek (termEnum);
          while (termDocs.next()) {
            retArray[termDocs.doc()] = t;
          }

          t++;
        } while (termEnum.next());
      } finally {
        termDocs.close();
        termEnum.close();
      }

      if (t == 0) {
        // if there are no terms, make the term array
        // have a single null entry
        mterms = new String[1];
      } else if (t < mterms.length) {
        // if there are less terms than documents,
        // trim off the dead array space
        String[] terms = new String[t];
        System.arraycopy (mterms, 0, terms, 0, t);
        mterms = terms;
      }

      StringIndex value = new StringIndex (retArray, mterms);
      return value;
    }
  };

The formula for a String Index fieldcache is essentially the String
array of unique terms (which does indeed "size down" at the bottom) and
the int array indexing into the String array.


Fuad Efendi wrote:
> To be correct, I analyzed FieldCache awhile ago and I believed it never
> "sizes down"...
>
> /**
>  * Expert: The default cache implementation, storing all values in memory.
>  * A WeakHashMap is used for storage.
>  *
>  * <p>Created: May 19, 2004 4:40:36 PM
>  *
>  * @since   lucene 1.4
>  */
>
>
> Will it size down? Only if we are not faceting (as in SOLR v.1.3)...
>
> And I am still unsure, Document ID vs. Object Pointer.
>
>
>
>
>   
>> I don't understand this:
>>     
>>> so with a ton of docs and a few uniques, you get a temp boost in the RAM
>>> reqs until it sizes it down.
>>>       
>> Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is
>> not cache?
>>
>>     
>
>
>   


-- 
- Mark

http://www.lucidimagination.com

Re: Lucene FieldCache memory requirements

Reply via email to