Re: Solr cache considerations

Isaac Hebsh Thu, 17 Jan 2013 21:20:31 -0800

Unfortunately, it seems (
http://lucene.472066.n3.nabble.com/Nrt-and-caching-td3993612.html) that
these caches are not per-segment. In this case, I want to (soft) commit
less frequently. Am I right?


Tomás, as the fieldValueCache is very similar to lucene's FieldCache, I
guess it has a big contribution to standard (not only faceted) queries
time. SolrWiki claims that it primarily used by faceting. What that says
about complex textual queries?

documentCache:
Erick, After a query processing is finished, doesn't some documents stay in
the documentCache? can't I use it to accelerate queries that should
retrieve stored fields of documents? In this case, a big documentCache can
hold more documents..

About commit frequency:
HardCommit: "openSearch=false" seems as a nice solution. Where can I read
about this? (found nothing but one unexplained sentence in SolrWiki).
SoftCommit: In my case, the required index freshness is 10 minutes. The
plan to soft commit every 10 minutes is similar to storing all of the
documents in a queue (outside to Solr), an indexing a bulk every 10 minutes.

Thanks.


On Fri, Jan 18, 2013 at 2:15 AM, Tomás Fernández Löbbe <
tomasflo...@gmail.com> wrote:

> I think fieldValueCache is not per segment, only fieldCache is. However,
> unless I'm missing something, this cache is only used for faceting on
> multivalued fields
>
>
> On Thu, Jan 17, 2013 at 8:58 PM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > filterCache: This is bounded by 1M * (maxDoc) / 8 * (num filters in
> > cache). Notice the /8. This reflects the fact that the filters are
> > represented by a bitset on the _internal_ Lucene ID. UniqueId has no
> > bearing here whatsoever. This is, in a nutshell, why warming is
> > required, the internal Lucene IDs may change. Note also that it's
> > maxDoc, the internal arrays have "holes" for deleted documents.
> >
> > Note this is an _upper_ bound, if there are only a few docs that
> > match, the size will be (num of matching docs) * sizeof(int)).
> >
> > fieldValueCache. I don't think so, although I'm a bit fuzzy on this.
> > It depends on whether these are "per-segment" caches or not. Any "per
> > segment" cache is still valid.
> >
> > Think of documentCache as intended to hold the stored fields while
> > various components operate on it, thus avoiding repeatedly fetching
> > the data from disk. It's _usually_ not too big a worry.
> >
> > About hard-commits once a day. That's _extremely_ long. Think instead
> > of committing more frequently with openSearcher=false. If nothing
> > else, you transaction log will grow lots and lots and lots. I'm
> > thinking on the order of 15 minutes, or possibly even much less. With
> > softCommits happening more often, maybe every 15 seconds. In fact, I'd
> > start out with soft commits every 15 seconds and hard commits
> > (openSearcher=false) every 5 minutes. The problem with hard commits
> > being once a day is that, if for any reason the server is interrupted,
> > on startup Solr will try to replay the entire transaction log to
> > assure index integrity. Not to mention that your tlog will be huge.
> > Not to mention that there is some memory usage for each document in
> > the tlog. Hard commits roll over the tlog, flush the in-memory tlog
> > pointers, close index segments, etc.
> >
> > Best
> > Erick
> >
> > On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh <isaac.he...@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > I am going to build a big Solr (4.0?) index, which holds some dozens of
> > > millions of documents. Each document has some dozens of fields, and one
> > big
> > > textual field.
> > > The queries on the index are non-trivial, and a little-bit long (might
> be
> > > hundreds of terms). No query is identical to another.
> > >
> > > Now, I want to analyze the cache performance (before setting up the
> whole
> > > environment), in order to estimate how much RAM will I need.
> > >
> > > filterCache:
> > > In my scenariom, every query has some filters. let's say that each
> filter
> > > matches 1M documents, out of 10M. Does the estimated memory usage
> should
> > be
> > > 1M * sizeof(uniqueId) * num-of-filters-in-cache?
> > >
> > > fieldValueCache:
> > > Due to the difference between queries, I guess that fieldValueCache is
> > the
> > > most important factor on query performance. Here comes a generic
> > question:
> > > I'm indexing new documents to the index constantly. Soft commits will
> be
> > > performed every 10 mins. Does it say that the cache is meaningless,
> after
> > > every 10 minutes?
> > >
> > > documentCache:
> > > enableLazyFieldLoading will be enabled, and "fl" contains a very small
> > set
> > > of fields. BUT, I need to return highlighting on about (possibly) 20
> > > fields. Does the highlighting component use the documentCache? I guess
> > that
> > > highlighting requires the whole field to be loaded into the
> > documentCache.
> > > Will it happen only for fields that matched a term from the query?
> > >
> > > And one more question: I'm planning to hard-commit once a day. Should I
> > > prepare to a significant RAM usage growth between hard-commits?
> > (consider a
> > > lot of new documents in this period...)
> > > Does this RAM comes from the same pool as the caches? An OutOfMemory
> > > exception can happen is this scenario?
> > >
> > > Thanks a lot.
> >
>

Re: Solr cache considerations

Reply via email to