I think fieldValueCache is not per segment, only fieldCache is. However,
unless I'm missing something, this cache is only used for faceting on
multivalued fields


On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
> cache). Notice the /8. This reflects the fact that the filters are
> represented by a bitset on the _internal_ Lucene ID. UniqueId has no
> bearing here whatsoever. This is, in a nutshell, why warming is
> required, the internal Lucene IDs may change. Note also that it's
> maxDoc, the internal arrays have "holes" for deleted documents.
>
> Note this is an _upper_ bound, if there are only a few docs that
> match, the size will be (num of matching docs) * sizeof(int)).
>
> fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
> It depends on whether these are "per-segment" caches or not. Any "per
> segment" cache is still valid.
>
> Think of documentCache as intended to hold the stored fields while
> various components operate on it, thus avoiding repeatedly fetching
> the data from disk. It's _usually_ not too big a worry.
>
> About hard-commits once a day. That's _extremely_ long. Think instead
> of committing more frequently with openSearcher=false. If nothing
> else, you transaction log will grow lots and lots and lots. I'm
> thinking on the order of 15 minutes, or possibly even much less. With
> softCommits happening more often, maybe every 15 seconds. In fact, I'd
> start out with soft commits every 15 seconds and hard commits
> (openSearcher=false) every 5 minutes. The problem with hard commits
> being once a day is that, if for any reason the server is interrupted,
> on startup Solr will try to replay the entire transaction log to
> assure index integrity. Not to mention that your tlog will be huge.
> Not to mention that there is some memory usage for each document in
> the tlog. Hard commits roll over the tlog, flush the in-memory tlog
> pointers, close index segments, etc.
>
> Best
> Erick
>
> On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh <isaac.he...@gmail.com>
> wrote:
> > Hi,
> >
> > I am going to build a big Solr (4.0?) index, which holds some dozens of
> > millions of documents. Each document has some dozens of fields, and one
> big
> > textual field.
> > The queries on the index are non-trivial, and a little-bit long (might be
> > hundreds of terms). No query is identical to another.
> >
> > Now, I want to analyze the cache performance (before setting up the whole
> > environment), in order to estimate how much RAM will I need.
> >
> > filterCache:
> > In my scenariom, every query has some filters. let's say that each filter
> > matches 1M documents, out of 10M. Does the estimated memory usage should
> be
> > 1M * sizeof(uniqueId) * num-of-filters-in-cache?
> >
> > fieldValueCache:
> > Due to the difference between queries, I guess that fieldValueCache is
> the
> > most important factor on query performance. Here comes a generic
> question:
> > I'm indexing new documents to the index constantly. Soft commits will be
> > performed every 10 mins. Does it say that the cache is meaningless, after
> > every 10 minutes?
> >
> > documentCache:
> > enableLazyFieldLoading will be enabled, and "fl" contains a very small
> set
> > of fields. BUT, I need to return highlighting on about (possibly) 20
> > fields. Does the highlighting component use the documentCache? I guess
> that
> > highlighting requires the whole field to be loaded into the
> documentCache.
> > Will it happen only for fields that matched a term from the query?
> >
> > And one more question: I'm planning to hard-commit once a day. Should I
> > prepare to a significant RAM usage growth between hard-commits?
> (consider a
> > lot of new documents in this period...)
> > Does this RAM comes from the same pool as the caches? An OutOfMemory
> > exception can happen is this scenario?
> >
> > Thanks a lot.
>

Reply via email to