Hi Mark, Yes, I understand it now; however, how will StringIndexCache size down in a production system faceting by Country on a homepage? This is SOLR specific...
Lucene specific: Lucene doesn't read from disk if it can retrieve field value for a specific document ID from cache. How will it size down in purely Lucene-based heavy-loaded production system? Especially if this cache is used for query optimizations. > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: November-02-09 8:53 PM > To: solr-user@lucene.apache.org > Subject: Re: Lucene FieldCache memory requirements > > static final class StringIndexCache extends Cache { > StringIndexCache(FieldCache wrapper) { > super(wrapper); > } > > @Override > protected Object createValue(IndexReader reader, Entry entryKey) > throws IOException { > String field = StringHelper.intern(entryKey.field); > final int[] retArray = new int[reader.maxDoc()]; > String[] mterms = new String[reader.maxDoc()+1]; > TermDocs termDocs = reader.termDocs(); > TermEnum termEnum = reader.terms (new Term (field)); > int t = 0; // current term number > > // an entry for documents that have no terms in this field > // should a document with no terms be at top or bottom? > // this puts them at the top - if it is changed, > FieldDocSortedHitQueue > // needs to change as well. > mterms[t++] = null; > > try { > do { > Term term = termEnum.term(); > if (term==null || term.field() != field) break; > > // store term text > // we expect that there is at most one term per document > if (t >= mterms.length) throw new RuntimeException ("there are > more terms than " + > "documents in field \"" + field + "\", but it's > impossible to sort on " + > "tokenized fields"); > mterms[t] = term.text(); > > termDocs.seek (termEnum); > while (termDocs.next()) { > retArray[termDocs.doc()] = t; > } > > t++; > } while (termEnum.next()); > } finally { > termDocs.close(); > termEnum.close(); > } > > if (t == 0) { > // if there are no terms, make the term array > // have a single null entry > mterms = new String[1]; > } else if (t < mterms.length) { > // if there are less terms than documents, > // trim off the dead array space > String[] terms = new String[t]; > System.arraycopy (mterms, 0, terms, 0, t); > mterms = terms; > } > > StringIndex value = new StringIndex (retArray, mterms); > return value; > } > }; > > The formula for a String Index fieldcache is essentially the String > array of unique terms (which does indeed "size down" at the bottom) and > the int array indexing into the String array. > > > Fuad Efendi wrote: > > To be correct, I analyzed FieldCache awhile ago and I believed it never > > "sizes down"... > > > > /** > > * Expert: The default cache implementation, storing all values in memory. > > * A WeakHashMap is used for storage. > > * > > * <p>Created: May 19, 2004 4:40:36 PM > > * > > * @since lucene 1.4 > > */ > > > > > > Will it size down? Only if we are not faceting (as in SOLR v.1.3)... > > > > And I am still unsure, Document ID vs. Object Pointer. > > > > > > > > > > > >> I don't understand this: > >> > >>> so with a ton of docs and a few uniques, you get a temp boost in the RAM > >>> reqs until it sizes it down. > >>> > >> Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it is > >> not cache? > >> > >> > > > > > > > > > -- > - Mark > > http://www.lucidimagination.com > >