Unfortunately, it seems ( http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that these caches are not per-segment. In this case, I want to (soft) commit less frequently. Am I right?
Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I guess it has a big contribution to standard (not only faceted) queries time. SolrWiki claims that it primarily used by faceting. What that says about complex textual queries? documentCache: Erick, After a query processing is finished, doesn't some documents stay in the documentCache? can't I use it to accelerate queries that should retrieve stored fields of documents? In this case, a big documentCache can hold more documents.. About commit frequency: HardCommit: "openSearch=false" seems as a nice solution. Where can I read about this? (found nothing but one unexplained sentence in SolrWiki). SoftCommit: In my case, the required index freshness is 10 minutes. The plan to soft commit every 10 minutes is similar to storing all of the documents in a queue (outside to Solr), an indexing a bulk every 10 minutes. Thanks. On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe < tomasflo...@gmail.com> wrote: > I think fieldValueCache is not per segment, only fieldCache is. However, > unless I'm missing something, this cache is only used for faceting on > multivalued fields > > > On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in > > cache). Notice the /8. This reflects the fact that the filters are > > represented by a bitset on the _internal_ Lucene ID. UniqueId has no > > bearing here whatsoever. This is, in a nutshell, why warming is > > required, the internal Lucene IDs may change. Note also that it's > > maxDoc, the internal arrays have "holes" for deleted documents. > > > > Note this is an _upper_ bound, if there are only a few docs that > > match, the size will be (num of matching docs) * sizeof(int)). > > > > fieldValueCache. I don't think so, although I'm a bit fuzzy on this. > > It depends on whether these are "per-segment" caches or not. Any "per > > segment" cache is still valid. > > > > Think of documentCache as intended to hold the stored fields while > > various components operate on it, thus avoiding repeatedly fetching > > the data from disk. It's _usually_ not too big a worry. > > > > About hard-commits once a day. That's _extremely_ long. Think instead > > of committing more frequently with openSearcher=false. If nothing > > else, you transaction log will grow lots and lots and lots. I'm > > thinking on the order of 15 minutes, or possibly even much less. With > > softCommits happening more often, maybe every 15 seconds. In fact, I'd > > start out with soft commits every 15 seconds and hard commits > > (openSearcher=false) every 5 minutes. The problem with hard commits > > being once a day is that, if for any reason the server is interrupted, > > on startup Solr will try to replay the entire transaction log to > > assure index integrity. Not to mention that your tlog will be huge. > > Not to mention that there is some memory usage for each document in > > the tlog. Hard commits roll over the tlog, flush the in-memory tlog > > pointers, close index segments, etc. > > > > Best > > Erick > > > > On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh <isaac.he...@gmail.com> > > wrote: > > > Hi, > > > > > > I am going to build a big Solr (4.0?) index, which holds some dozens of > > > millions of documents. Each document has some dozens of fields, and one > > big > > > textual field. > > > The queries on the index are non-trivial, and a little-bit long (might > be > > > hundreds of terms). No query is identical to another. > > > > > > Now, I want to analyze the cache performance (before setting up the > whole > > > environment), in order to estimate how much RAM will I need. > > > > > > filterCache: > > > In my scenariom, every query has some filters. let's say that each > filter > > > matches 1M documents, out of 10M. Does the estimated memory usage > should > > be > > > 1M * sizeof(uniqueId) * num-of-filters-in-cache? > > > > > > fieldValueCache: > > > Due to the difference between queries, I guess that fieldValueCache is > > the > > > most important factor on query performance. Here comes a generic > > question: > > > I'm indexing new documents to the index constantly. Soft commits will > be > > > performed every 10 mins. Does it say that the cache is meaningless, > after > > > every 10 minutes? > > > > > > documentCache: > > > enableLazyFieldLoading will be enabled, and "fl" contains a very small > > set > > > of fields. BUT, I need to return highlighting on about (possibly) 20 > > > fields. Does the highlighting component use the documentCache? I guess > > that > > > highlighting requires the whole field to be loaded into the > > documentCache. > > > Will it happen only for fields that matched a term from the query? > > > > > > And one more question: I'm planning to hard-commit once a day. Should I > > > prepare to a significant RAM usage growth between hard-commits? > > (consider a > > > lot of new documents in this period...) > > > Does this RAM comes from the same pool as the caches? An OutOfMemory > > > exception can happen is this scenario? > > > > > > Thanks a lot. > > >