Re: Solr cache considerations

Erick Erickson Sun, 20 Jan 2013 08:09:19 -0800

About your question about document cache: Typically the document cache
has a pretty low hit-ratio. I've rarely, if ever, seen it get hit very
often. And remember that this cache is only hit when assembling the
response for a few documents (your page size).


Bottom line: I wouldn't worry about this cache much. It's quite useful
for processing a particular query faster, but not really intended for
cross-query use.

Really, I think you're getting the cart before the horse here. Run it
up the flagpole and try it. Rely on the OS to do its job
(http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html).
Find  a bottleneck _then_ tune. Premature optimization and all
that....

Several tens of millions of docs isn't that large unless the text
fields are enormous.

Best
Erick

On Sat, Jan 19, 2013 at 2:32 PM, Isaac Hebsh <isaac.he...@gmail.com> wrote:
> Ok. Thank you everyone for your helpful answers.
> I understand that fieldValueCache is not used for resolving queries.
> Is there any cache that can help this basic scenario (a lot of different
> queries, on a small set of fields)?
> Does Lucene's FieldCache help (implicitly)?
> How can I use RAM to reduce I/O in this type of queries?
>
>
> On Fri, Jan 18, 2013 at 4:09 PM, Tomás Fernández Löbbe <
> tomasflo...@gmail.com> wrote:
>
>> No, the fieldValueCache is not used for resolving queries. Only for
>> multi-token faceting and apparently for the stats component too. The
>> document cache maintains in memory the stored content of the fields you are
>> retrieving or highlighting on. It'll hit if the same document matches the
>> query multiple times and the same fields are requested, but as Eirck said,
>> it is important for cases when multiple components in the same request need
>> to access the same data.
>>
>> I think soft committing every 10 minutes is totally fine, but you should
>> hard commit more often if you are going to be using transaction log.
>> openSearcher=false will essentially tell Solr not to open a new searcher
>> after the (hard) commit, so you won't see the new indexed data and caches
>> wont be flushed. openSearcher=false makes sense when you are using
>> hard-commits together with soft-commits, as the "soft-commit" is dealing
>> with opening/closing searchers, you don't need hard commits to do it.
>>
>> Tomás
>>
>>
>> On Fri, Jan 18, 2013 at 2:20 AM, Isaac Hebsh <isaac.he...@gmail.com>
>> wrote:
>>
>> > Unfortunately, it seems (
>> > http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that
>> > these caches are not per-segment. In this case, I want to (soft) commit
>> > less frequently. Am I right?
>> >
>> > Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I
>> > guess it has a big contribution to standard (not only faceted) queries
>> > time. SolrWiki claims that it primarily used by faceting. What that says
>> > about complex textual queries?
>> >
>> > documentCache:
>> > Erick, After a query processing is finished, doesn't some documents stay
>> in
>> > the documentCache? can't I use it to accelerate queries that should
>> > retrieve stored fields of documents? In this case, a big documentCache
>> can
>> > hold more documents..
>> >
>> > About commit frequency:
>> > HardCommit: "openSearch=false" seems as a nice solution. Where can I read
>> > about this? (found nothing but one unexplained sentence in SolrWiki).
>> > SoftCommit: In my case, the required index freshness is 10 minutes. The
>> > plan to soft commit every 10 minutes is similar to storing all of the
>> > documents in a queue (outside to Solr), an indexing a bulk every 10
>> > minutes.
>> >
>> > Thanks.
>> >
>> >
>> > On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe <
>> > tomasflo...@gmail.com> wrote:
>> >
>> > > I think fieldValueCache is not per segment, only fieldCache is.
>> However,
>> > > unless I'm missing something, this cache is only used for faceting on
>> > > multivalued fields
>> > >
>> > >
>> > > On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson <
>> erickerick...@gmail.com
>> > > >wrote:
>> > >
>> > > > filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
>> > > > cache). Notice the /8. This reflects the fact that the filters are
>> > > > represented by a bitset on the _internal_ Lucene ID. UniqueId has no
>> > > > bearing here whatsoever. This is, in a nutshell, why warming is
>> > > > required, the internal Lucene IDs may change. Note also that it's
>> > > > maxDoc, the internal arrays have "holes" for deleted documents.
>> > > >
>> > > > Note this is an _upper_ bound, if there are only a few docs that
>> > > > match, the size will be (num of matching docs) * sizeof(int)).
>> > > >
>> > > > fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
>> > > > It depends on whether these are "per-segment" caches or not. Any "per
>> > > > segment" cache is still valid.
>> > > >
>> > > > Think of documentCache as intended to hold the stored fields while
>> > > > various components operate on it, thus avoiding repeatedly fetching
>> > > > the data from disk. It's _usually_ not too big a worry.
>> > > >
>> > > > About hard-commits once a day. That's _extremely_ long. Think instead
>> > > > of committing more frequently with openSearcher=false. If nothing
>> > > > else, you transaction log will grow lots and lots and lots. I'm
>> > > > thinking on the order of 15 minutes, or possibly even much less. With
>> > > > softCommits happening more often, maybe every 15 seconds. In fact,
>> I'd
>> > > > start out with soft commits every 15 seconds and hard commits
>> > > > (openSearcher=false) every 5 minutes. The problem with hard commits
>> > > > being once a day is that, if for any reason the server is
>> interrupted,
>> > > > on startup Solr will try to replay the entire transaction log to
>> > > > assure index integrity. Not to mention that your tlog will be huge.
>> > > > Not to mention that there is some memory usage for each document in
>> > > > the tlog. Hard commits roll over the tlog, flush the in-memory tlog
>> > > > pointers, close index segments, etc.
>> > > >
>> > > > Best
>> > > > Erick
>> > > >
>> > > > On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh <isaac.he...@gmail.com>
>> > > > wrote:
>> > > > > Hi,
>> > > > >
>> > > > > I am going to build a big Solr (4.0?) index, which holds some
>> dozens
>> > of
>> > > > > millions of documents. Each document has some dozens of fields, and
>> > one
>> > > > big
>> > > > > textual field.
>> > > > > The queries on the index are non-trivial, and a little-bit long
>> > (might
>> > > be
>> > > > > hundreds of terms). No query is identical to another.
>> > > > >
>> > > > > Now, I want to analyze the cache performance (before setting up the
>> > > whole
>> > > > > environment), in order to estimate how much RAM will I need.
>> > > > >
>> > > > > filterCache:
>> > > > > In my scenariom, every query has some filters. let's say that each
>> > > filter
>> > > > > matches 1M documents, out of 10M. Does the estimated memory usage
>> > > should
>> > > > be
>> > > > > 1M * sizeof(uniqueId) * num-of-filters-in-cache?
>> > > > >
>> > > > > fieldValueCache:
>> > > > > Due to the difference between queries, I guess that fieldValueCache
>> > is
>> > > > the
>> > > > > most important factor on query performance. Here comes a generic
>> > > > question:
>> > > > > I'm indexing new documents to the index constantly. Soft commits
>> will
>> > > be
>> > > > > performed every 10 mins. Does it say that the cache is meaningless,
>> > > after
>> > > > > every 10 minutes?
>> > > > >
>> > > > > documentCache:
>> > > > > enableLazyFieldLoading will be enabled, and "fl" contains a very
>> > small
>> > > > set
>> > > > > of fields. BUT, I need to return highlighting on about (possibly)
>> 20
>> > > > > fields. Does the highlighting component use the documentCache? I
>> > guess
>> > > > that
>> > > > > highlighting requires the whole field to be loaded into the
>> > > > documentCache.
>> > > > > Will it happen only for fields that matched a term from the query?
>> > > > >
>> > > > > And one more question: I'm planning to hard-commit once a day.
>> > Should I
>> > > > > prepare to a significant RAM usage growth between hard-commits?
>> > > > (consider a
>> > > > > lot of new documents in this period...)
>> > > > > Does this RAM comes from the same pool as the caches? An
>> OutOfMemory
>> > > > > exception can happen is this scenario?
>> > > > >
>> > > > > Thanks a lot.
>> > > >
>> > >
>> >
>>

Re: Solr cache considerations

Reply via email to