I think fieldValueCache is not per segment, only fieldCache is. However, unless I'm missing something, this cache is only used for faceting on multivalued fields
On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson <erickerick...@gmail.com>wrote: > filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in > cache). Notice the /8. This reflects the fact that the filters are > represented by a bitset on the _internal_ Lucene ID. UniqueId has no > bearing here whatsoever. This is, in a nutshell, why warming is > required, the internal Lucene IDs may change. Note also that it's > maxDoc, the internal arrays have "holes" for deleted documents. > > Note this is an _upper_ bound, if there are only a few docs that > match, the size will be (num of matching docs) * sizeof(int)). > > fieldValueCache. I don't think so, although I'm a bit fuzzy on this. > It depends on whether these are "per-segment" caches or not. Any "per > segment" cache is still valid. > > Think of documentCache as intended to hold the stored fields while > various components operate on it, thus avoiding repeatedly fetching > the data from disk. It's _usually_ not too big a worry. > > About hard-commits once a day. That's _extremely_ long. Think instead > of committing more frequently with openSearcher=false. If nothing > else, you transaction log will grow lots and lots and lots. I'm > thinking on the order of 15 minutes, or possibly even much less. With > softCommits happening more often, maybe every 15 seconds. In fact, I'd > start out with soft commits every 15 seconds and hard commits > (openSearcher=false) every 5 minutes. The problem with hard commits > being once a day is that, if for any reason the server is interrupted, > on startup Solr will try to replay the entire transaction log to > assure index integrity. Not to mention that your tlog will be huge. > Not to mention that there is some memory usage for each document in > the tlog. Hard commits roll over the tlog, flush the in-memory tlog > pointers, close index segments, etc. > > Best > Erick > > On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh <isaac.he...@gmail.com> > wrote: > > Hi, > > > > I am going to build a big Solr (4.0?) index, which holds some dozens of > > millions of documents. Each document has some dozens of fields, and one > big > > textual field. > > The queries on the index are non-trivial, and a little-bit long (might be > > hundreds of terms). No query is identical to another. > > > > Now, I want to analyze the cache performance (before setting up the whole > > environment), in order to estimate how much RAM will I need. > > > > filterCache: > > In my scenariom, every query has some filters. let's say that each filter > > matches 1M documents, out of 10M. Does the estimated memory usage should > be > > 1M * sizeof(uniqueId) * num-of-filters-in-cache? > > > > fieldValueCache: > > Due to the difference between queries, I guess that fieldValueCache is > the > > most important factor on query performance. Here comes a generic > question: > > I'm indexing new documents to the index constantly. Soft commits will be > > performed every 10 mins. Does it say that the cache is meaningless, after > > every 10 minutes? > > > > documentCache: > > enableLazyFieldLoading will be enabled, and "fl" contains a very small > set > > of fields. BUT, I need to return highlighting on about (possibly) 20 > > fields. Does the highlighting component use the documentCache? I guess > that > > highlighting requires the whole field to be loaded into the > documentCache. > > Will it happen only for fields that matched a term from the query? > > > > And one more question: I'm planning to hard-commit once a day. Should I > > prepare to a significant RAM usage growth between hard-commits? > (consider a > > lot of new documents in this period...) > > Does this RAM comes from the same pool as the caches? An OutOfMemory > > exception can happen is this scenario? > > > > Thanks a lot. >