No, the fieldValueCache is not used for resolving queries. Only for multi-token faceting and apparently for the stats component too. The document cache maintains in memory the stored content of the fields you are retrieving or highlighting on. It'll hit if the same document matches the query multiple times and the same fields are requested, but as Eirck said, it is important for cases when multiple components in the same request need to access the same data.
I think soft committing every 10 minutes is totally fine, but you should hard commit more often if you are going to be using transaction log. openSearcher=false will essentially tell Solr not to open a new searcher after the (hard) commit, so you won't see the new indexed data and caches wont be flushed. openSearcher=false makes sense when you are using hard-commits together with soft-commits, as the "soft-commit" is dealing with opening/closing searchers, you don't need hard commits to do it. Tomás On Fri, Jan 18, 2013 at 2:20 AM, Isaac Hebsh <isaac.he...@gmail.com> wrote: > Unfortunately, it seems ( > http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that > these caches are not per-segment. In this case, I want to (soft) commit > less frequently. Am I right? > > Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I > guess it has a big contribution to standard (not only faceted) queries > time. SolrWiki claims that it primarily used by faceting. What that says > about complex textual queries? > > documentCache: > Erick, After a query processing is finished, doesn't some documents stay in > the documentCache? can't I use it to accelerate queries that should > retrieve stored fields of documents? In this case, a big documentCache can > hold more documents.. > > About commit frequency: > HardCommit: "openSearch=false" seems as a nice solution. Where can I read > about this? (found nothing but one unexplained sentence in SolrWiki). > SoftCommit: In my case, the required index freshness is 10 minutes. The > plan to soft commit every 10 minutes is similar to storing all of the > documents in a queue (outside to Solr), an indexing a bulk every 10 > minutes. > > Thanks. > > > On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe < > tomasflo...@gmail.com> wrote: > > > I think fieldValueCache is not per segment, only fieldCache is. However, > > unless I'm missing something, this cache is only used for faceting on > > multivalued fields > > > > > > On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson <erickerick...@gmail.com > > >wrote: > > > > > filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in > > > cache). Notice the /8. This reflects the fact that the filters are > > > represented by a bitset on the _internal_ Lucene ID. UniqueId has no > > > bearing here whatsoever. This is, in a nutshell, why warming is > > > required, the internal Lucene IDs may change. Note also that it's > > > maxDoc, the internal arrays have "holes" for deleted documents. > > > > > > Note this is an _upper_ bound, if there are only a few docs that > > > match, the size will be (num of matching docs) * sizeof(int)). > > > > > > fieldValueCache. I don't think so, although I'm a bit fuzzy on this. > > > It depends on whether these are "per-segment" caches or not. Any "per > > > segment" cache is still valid. > > > > > > Think of documentCache as intended to hold the stored fields while > > > various components operate on it, thus avoiding repeatedly fetching > > > the data from disk. It's _usually_ not too big a worry. > > > > > > About hard-commits once a day. That's _extremely_ long. Think instead > > > of committing more frequently with openSearcher=false. If nothing > > > else, you transaction log will grow lots and lots and lots. I'm > > > thinking on the order of 15 minutes, or possibly even much less. With > > > softCommits happening more often, maybe every 15 seconds. In fact, I'd > > > start out with soft commits every 15 seconds and hard commits > > > (openSearcher=false) every 5 minutes. The problem with hard commits > > > being once a day is that, if for any reason the server is interrupted, > > > on startup Solr will try to replay the entire transaction log to > > > assure index integrity. Not to mention that your tlog will be huge. > > > Not to mention that there is some memory usage for each document in > > > the tlog. Hard commits roll over the tlog, flush the in-memory tlog > > > pointers, close index segments, etc. > > > > > > Best > > > Erick > > > > > > On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh <isaac.he...@gmail.com> > > > wrote: > > > > Hi, > > > > > > > > I am going to build a big Solr (4.0?) index, which holds some dozens > of > > > > millions of documents. Each document has some dozens of fields, and > one > > > big > > > > textual field. > > > > The queries on the index are non-trivial, and a little-bit long > (might > > be > > > > hundreds of terms). No query is identical to another. > > > > > > > > Now, I want to analyze the cache performance (before setting up the > > whole > > > > environment), in order to estimate how much RAM will I need. > > > > > > > > filterCache: > > > > In my scenariom, every query has some filters. let's say that each > > filter > > > > matches 1M documents, out of 10M. Does the estimated memory usage > > should > > > be > > > > 1M * sizeof(uniqueId) * num-of-filters-in-cache? > > > > > > > > fieldValueCache: > > > > Due to the difference between queries, I guess that fieldValueCache > is > > > the > > > > most important factor on query performance. Here comes a generic > > > question: > > > > I'm indexing new documents to the index constantly. Soft commits will > > be > > > > performed every 10 mins. Does it say that the cache is meaningless, > > after > > > > every 10 minutes? > > > > > > > > documentCache: > > > > enableLazyFieldLoading will be enabled, and "fl" contains a very > small > > > set > > > > of fields. BUT, I need to return highlighting on about (possibly) 20 > > > > fields. Does the highlighting component use the documentCache? I > guess > > > that > > > > highlighting requires the whole field to be loaded into the > > > documentCache. > > > > Will it happen only for fields that matched a term from the query? > > > > > > > > And one more question: I'm planning to hard-commit once a day. > Should I > > > > prepare to a significant RAM usage growth between hard-commits? > > > (consider a > > > > lot of new documents in this period...) > > > > Does this RAM comes from the same pool as the caches? An OutOfMemory > > > > exception can happen is this scenario? > > > > > > > > Thanks a lot. > > > > > >